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ON SOME PROPERTIES OF NORMAL DISTRIBUTIONS, 
UNIVARIATE AND BIVARIATE, BASED ON SUMS OF 
SQUARES OF FREQUENCIES 


By G. UDNY YULE, F.R.S. 
T'o the memory of Karl Pearson’s lectures of 1894-5 


IN my notes of the lectures given by Karl Pearson in November 1894* there 
occurs the following entry, the grammar of which I fear is my own: 


NorMAL CURVE 
On the determination of the Standard Deviation.... 
Suppose you are given an observation-frequency-polygon and wish to find the standard 
deviation. 
We might proceed (1) by determining the absolute position of the centroid (2) by de- 
termining the swing radius about the centroid vertical. 


The second method is merely the expression in Pearson’s terminology of 
that date for calculating the standard deviation from the second moment. About 
the first suggestion for calculating it from “the absolute position of the centroid”, 
i.e. from the co-ordinates of the mean ¥ and ¥, he apparently said little more 
and the notion was not developed, at least in the lectures. But it seems worth 
a little development, which exhibits interesting properties of normal distri- 


butions. 7 is evidently given by } y2dx and it is this integral, or the equiva- 
i] 7 & a 2 y t=) 
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lent series, that we have to consider. 


1. UNIVARIATE DISTRIBUTIONS 


Let the distribution of the normal variate 2 be given by 
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* In the Library of the Department of Statistics at University College, London, vol. 1, p. 66: 
see the abstract in an article in the Miscellanea section of the present issue of this Journal. 
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2 On some Properties of Normal Distributions 


This again is a normal distribution, with standard deviation o//2, the integral 
with respect to x being N3/204/7. Hence, writing N, for the sum of squares of 
class frequencies, which may be taken as equivalent to the integral for small 
values of the class interval, we may estimate o by the relation 


N? N2 
o= > NOMI ee eo ages 3 
2+/ aN 2 N 2 ( ) 


The unit is obviously the class interval. This result is interesting and looks odd 
at first sight as one does not appear to be introducing x at all. But y (obser- 
vations per interval) is of dimensions 1/c, y? is of dimensions 1/o?, and the integral 
of y* with respect to x is therefore again of dimensions 1/c. 

Let us call the estimate of the standard deviation by equation (3) o, to 
distinguish it from the true value o determined from the second moment, and 
let us compare the values of o and o, for some actual distributions, both normal 
or near normal and more or less widely divergent from normality. 


(1) Stature of males in the U.K. (British Association: cited in Yule and 
Kendall, Introduction to the Theory of Statistics, 11th ed. (1937), p. 94). 
£,= 90-000, £,=3-15. Nearly normal but slightly leptokurtic. o = 2-572 intervals 
or inches, or 2-556 using Sheppard’s correction. N,=8585, N,=8,144,417, 
o, = 2-553, a negative error of only a fraction of 1% on either value of oc. 


(2) Stature of fathers (Karl Pearson’s table for correlation of stature in 
father and son, cited Yule and Kendall (1937), p. 199). £,=0-014, f,=2-91. 
Near normal, but slightly platykurtic. o=2-72 intervals or inches, or 2-70 
corrected. N,= 1078, N,= 120,007, o,= 2-73, a positive error of some 1% on 
the corrected, one-third of 1° on the uncorrected value of o. 


(3) Stature of sons (as for (2) above). £,=0-049, £,=3-30. Near normal, but 
leptokurtic. o=2-75 intervals or inches, or 2-73 corrected. N,=1078, 


N,= 122,805-5, o, = 2-67, error —2-9 or —2-2%. 


(4) Actuarial (Elderton, Frequency Curves and Correlation (1927), p. 66, 
(1938), p. 68). £,=0-005, £,= 3-17. Near normal, slightly leptokurtic. o = 2-147 
uncorrected, 2-128 corrected. N,=9154, N,= 11,058,544, 7, = 2-138, error — 0-4 
or +0°5%. 


(5) Actuarial (Elderton (1927), p. 75, (1938), p. 77). 2,=0°995, £,=4-74. 
Skew and highly leptokurtic. o= 1-006 uncorrected, 0-964 corrected. N,= 368, 
N,= 41,872, ¢, =0-912, error — 10-2 or —5-4%. 


(6) Ages of men at marriage in Australia (Biometrika, 22 (1930), 210; cited in 
Yule and Kendall (1937), p. 96). £,= 3-85, 8,= 8-33. Very skew and very highly 
leptokurtic. o=2-673 intervals (3 years), or 2-657 corrected. N,=301,785, 
N, = 14,290,993,813, o, = 1-798, error roundly —33%. 














G. Upny YULE 3 
(7) Symmetrical binomials (1+1)". Platykurtic. 


n Bs a o, Error % 
2 2 0-707 0-752 +6-4 
4 2-5 l 1-032 +3-2 
6 2-67 1-225 | 1-250 } +2-0 
8 | 2-75 1-414 1-436 +1-6 


(8) Skew binomial (2+1)§. £,=0-0625, £,=2-81. Skew and platykurtic. 
o = 1-333, o, = 1-349, error +1-:2%. 

(9) Sampling distribution of correlations from conjunct series (Yule, Roy. 
Statist. Soc. 89 (1926), 34). £,=0-003, £,=1-915. Highly platykurtic, nearly 
symmetrical, looks almost semi-elliptical. o=5-128 intervals uncorrected. 
N, = 600, N, = 19,696, o, = 5-156, error +0-55%. 

(10) A rectangle, length /. £,=1-8. Highly platykurtic. o =1/4/12=0-28871. 
o,=1/2./n=0-28211. Error —2-3%. 

The error of 7, varies, as one would expect, with the form of the distribution. 
For leptokurtic distributions the error is negative without exception if comparison 
is made with the uncorrected o (the better comparison, see below on the effect 
of grouping), i.e. 7, gives too low a value; while for platykurtic distributions 
the value is usually too high. This again is as one would expect; for the present 
method attaches very little weight to the small frequencies in the long tails of 
a leptokurtic distribution, whereas the method of moments weights them 
heavily, and hence the wide divergence between o and a, in example (6). But 
what is remarkable, and was certainly unexpected to me, is that while o, seems 
very sensitive to small divergences from normality in the direction of lepto- 
kurtosis, witness the error in example (3), it is comparatively insensitive to the 
widest divergences in the direction of platykurtosis. Nothing can be much more 
platykurtic than a rectangular distribution, but the error of o, is no more than 
2-3% (example 10): a distribution that looks almost semi-elliptical to the eye 
(example 9) is a long way from normality, but the error is no more than 0-55 © 
The closeness of approximation in these cases puzzles me, and makes it evident 
that comparison of o, with ¢ is of little use as a test of normality. 


Distributions occasionally occur in which a number of wild outliers, far 
beyond the +30 range, seem to be superposed on a distribution otherwise 
normal or near it. It occurred to me as possible that the low weight attached to 
small frequencies by the method of equation (3) might render it useful in such 
cases for approximating to the value of the standard deviation for the bulk of 
the distribution. Thus I compiled, from the table of areas of the normal curve, 
a normal distribution of 1000 observations with intervals of 0-40, making 
o=2-5. Superposing on this 10 outliers, one each in the intervals centering 


I-2 
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round +3-6¢0 to +5-20, the standard deviation is raised to 2-719, an increase 
of 8-8%. The present method gives a value of 2-557, only 2-3°% above the 
standard deviation of the normal curve forming the bulk of the distribution. 
It also suggested itself that one might proceed further on the same lines, 
utilizing higher powers of the class frequencies. Denoting these by m, form the 
quantities N,=S(m*), Ne=S(m*), nana (4) 
and derive an estimate of the standard deviation from equations corresponding 
to (3), but based on N, or N,, viz. 


2 we 0-091 8882 MN (5) 

of*= — > =U'Ve SoZ Sey, FS ey Oe atee vo 
2774, 3 N 3 N 3 

2 LM 0-031 7468 Ni (6) 

or orf*= ; — =U'V9 ri de eee er ge AS. ea ) 
2 (27)! N, N, 


The convergence towards the standard deviation of the bulk of the distribution 
given by (3), (5) and (6) seems, however, to be slow: (5) gives 2-547 and (6) gives 
2-543, thus bringing the error down to 1-:7°%, but not lower. As a further 
illustration, it may be of interest to bring together the values of 7, given by the 


é 


three equations for the stature distributions of examples 2 and 3: 


Example 2 Example 3 
Fathers Sons 


E.quation (3) 2-732 2-669 
Equation (5) 2-716 2-639 
Equation (6) 2-755 2-617 


The slightly platykurtic distribution for fathers shows only erratic small changes 
in the second place of decimals: the rather leptokurtic distribution for sons 
gives a steady decrease in o, as we pass from N, as a basis to N,. 

Effect of growping. As a matter of practice, it is of interest to enquire what 
is the effect on the result given by equation (3) of increasing coarseness of 
grouping. To throw some light on this, I tried the effect, on the B.A. stature 
data of example 1, of altering the interval, (i) to 2 in., and (ii) to 3 in., and as 
a check made two different groupings with each of these intervals by starting 
the scale at a different point. Converting all the standard deviations back from 
intervals to inches, the results were as follows: 





Interval o o oO, by 
in. uncorrected corrected equation (3) 
] 2-572 2° 2-553 
2 2-622 2: 2-611 
2-619 2-5 2-597 
3 2-702 2-559 2-698 
9 


“705 2-563 2-691 














——— 
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The first column of figures exhibits the well-known effect of increasing coarseness 
of grouping, the values of the uncorrected standard deviation rising with each 
increase in the interval. The figures in the second column are a testimonial to 
the efficiency of Sheppard’s correction. Turning to the last column, we see that 
here also coarseness of grouping increases the value found; and, very un- 
expectedly, the increase runs closely parallel to that of the uncorrected standard 
deviations in the first column. I think we may conclude that the effect of 
grouping on the values given for o, by equation (3) is similar to the effect on 
uncorrected standard deviations calculated from the second moment and, not 
to put it too precisely, of much the same magnitude. Hence, as suggested, it 
is better in the above examples to compare o, with the uncorrected o, both 
being similarly affected by grouping. 

A measure of “concentration”. While equation (3) continues to give not 
unreasonable approximations even when the “humped” or “cocked hat” 
distribution passes through a semi-elliptical to the rectangular form, if it passes 
further through this form to a double-humped or U-shape, the equation will 
obviously lead to absurd results. If the U-shape is symmetrical, the standard 
deviation will be the greater the larger the proportion of the total frequency 
crowded into the extreme compartments: but the larger this proportion, the 
smaller will be the ratio N?/N, and the smaller o,. 

The ratio N,/N} will, however, retain a meaning of its own in all cases. 
If we write 

yd (7) 
iy N?? eeeeee ‘ 


then k may be called the concentration, as its magnitude indicates the extent 
to which frequency is piled on to a few intervals, without respect to the relative 
positions of those intervals, whether near the middle or towards the ends of 
the range of variation. If all the observations are piled on to a single interval, 
k will become equal to unity, its highest possible value: if they are divided 
equally between r intervals, k=1/r. For the three distributions of stature the 
values of the “concentration”’ are 

Example 1: 0-1105; Example 2: 0-1033; Example 3: 0-1057; 


so that the concentration is equivalent to that of a uniform (rectangular) 
distribution over 9 or 10 intervals. 

The form of the sampling distribution of the correlation coefficient for 
samples from a normal uncorrelated universe changes from a U-shape with 
maxinium frequencies at the ends of the range when the number » in the sample 
is 3, to a rectangular distribution when n=4. For n=5 the distribution is 
“arched”’ rather than “humped”’, and as » is further increased the distribution 
slowly developes tails and begins to approach the normal form with the mode 
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at zero. The values of the concentration run approximately as follows, the 
interval being 0-1.* 





n o (intervals) k 
Po 7-071 | 00710 
ae 5-774 |. 0-05 

5 5 0-0539 

6 4-472 0-0599 

7 4-082 0-0657 

8 3-780 0-0713 

9 3-536 0-0764 

10 3-333 0-0813 


These figures are interesting; for, while the standard deviation continuously 
falls as m increases, the concentration (owing to the changing form of the 
distribution) at first falls and then slowly rises, until when n=8 it attains a 
value only very slightly greater than that for n=3. This brings out clearly the 


difference between k and a measure of “precision” such as the reciprocal of 


the standard deviation, k paying no regard to the position on the scale of the 
intervals of those on which the concentration mainly falls. For n=3 these are 
the terminal intervals, for n=8 the central intervals, but in both cases the 
distribution is about as concentrated as if the frequency were uniformly 
distributed over 14 intervals (1/0-071 = 14 approximately). 

Standard errors. Let us first obtain the standard error of N,. Using m’s 
to denote the frequencies we have 


Nog=mi+mz+...+m?, 


IN, =2 (m,dm,+mgdmg+...+m,dm,),  —§ | aaenee 
S (dN,)?=48, {m?, (dm,,)*} + 8S, (mm, 
where S, is a sum for all values of p from 1 to r and all of, say, n samples; and 
S, is a sum for all different values of p and q and all samples. 


Carrying out S, for any particular value of p on all samples, we get 


4 
» tz m _ mm 
nm In, Pi - ) =n ( m3, - 2). 
PIN N,/ N, 


Carrying out S, for any particular values of p and q on all samples, we get 


Therefore S (dN)? = 4nS, (m3) — 4 — S, (m5)—8 — S, (m2m?). —...... (9) 


1 
* The tables in Biometrika, 11 (1917), 379 et seq. give ordinates at every 0-05 of the scale. 
From these the frequencies in intervals of 0-1 were calculated, and thence the concentrations, 
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The first two sums on the right may be called N, and N, as before. As regards 


the third, since a 
N3=S, (mé) + 28, (m2 m2) 


its value is 4 (N3-—N,). 
Therefore dividing through by n, 
ee 
2 N,)= a att | * N2-— y 
o* (N,)=4N,—4 N,N, (N3—N,) 
N? 
=4 (s— a oT Ses ee (10) 
N,) 
and hence at once o? (k) =o? (N,/N?)= Na (a 3— nd: Sse (11) 
But if z= : . C=. 
x : ; 
for small deviations and hence 
Nihon es N “= 12 
o* (oc, =a! 2, Bw on 1S Sp eee (12) 


It is interesting to see how this value compares with the ordinary standard 
error for the standard deviation of a normal distribution determined from the 
second moment, 


l 
a (o)= PO MR CO eae (13) 
Vv (2.N)) VN, 
For the normal distribution we have from (3) and (5) 
2 Rit 
N a, ee (14) 
Substituting this in (12) and using (3) eR 
9 \ 2 
o? (o,)=4|—- 1) : 
( e) ‘ 3 N, 
ao - 
=O MES mo ae (15) 
N, 
or, taking the square root, for a normal distribution 
\ Cc J 
joj=— 78s ee... ee (16) 
\ Ny 


The standard error of o, therefore exceeds that of o by 11-25% or roundly 
one-ninth, if the distribution is strictly normal. 

The proportionate difference between the result given by (12) and that 
given by (13) appears, however, to be very sensitive to small deviations from 
absolute normality. For the erp-3e distributions of fathers and sons (examples 
2 and 3) I find for the values of N, 15,271,469 and 16,527,314 respectively, and 
hence for the values of o (¢,) 


Fathers, 0:0629; Sons, 0-0725. 
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Inserting the values of o, in (13) gives 
Fathers, 0-0588; Sons, 0-0575. 
For fathers the first value is only some 7% in excess of the second, for sons 
some 26%. The first distribution is slightly platykurtic, the second slightly 
leptokurtic: or, what is perhaps more directly to the point, in the first case the 
value of N, is less than that given by (14) in terms of N, and N, (15,271,469 
against 15,426,370); in the second case greater (16,527,314 against 16,154,229). 
The standard errors of k by equation (11) for these two examples are 
0-00238 (k = 0-1033) and 0-00274 (k= 0-1057) or between 2 and 3 % of the values. 
I am obliged to Mr M. G. Kendall for help in this work on standard 
errors. 


2. BIVARIATE DISTRIBUTIONS 


Consider the normal distribution 


. 1 fx, r2* on 21%2 
— N 1 2(1—r*) (c, oF = =o 17 
7= 9 / aa © Teil. Y wena ae ( 7) 
271071, (1 —1*)? 


On squaring as before, the exponential evidently takes the form for a normal 
distribution with correlation r but the standard deviations reduced to o,/+/2 
and o,/./2. The term for the central ordinate is altered to 
Vy 
9 9 > 2 > 
4n*o 5 035 (1 —71*) 

which may be written 

N? ] 

+ _____ x 
470, 0,(1—r?) ,. 


o 
‘ 1 % 2)3 
27 = (1 =2*) 





24 


That is, if N3 is the sum of the squares of frequencies, 


ed 
: ino, 7, (1—r?)! 
. (i -2\3 — I NY 18 
OI =F ) = 17010» Nz" eveces ( 8) 


But by (3), if N,, N3 are the sums of squares of frequencies for the total distri- 
butions of 2, and x, respectively, 


24/1 o,= ak ‘N O.= ek 
-“V 1 N.’ “¥V\ 2 Ni 5 
ol oa NeNs Nt NN; 2 5 
whence (l—r?)? = Na x Nem N2 x ee (19) 


This is the convenient form for arithmetic, as it involves only the three sums of 
squares N,, N3 and N%$, which can be run off very quickly with the aid of 





es 
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Burroughs’ Machine. But the first fraction is the sum of squares of the in- 
dependence frequencies, so we have also the interesting form 


. = 2 
(rhe S (independence frequencies)? 


imines oO (20) 


I tested (19) on Karl Pearson’s correlation table for stature of father and 
son, for which the product-sum correlation (using corrected standard deviations) 
is 0-5199, retaining the 1 in. grouping. The values of the N’s found are 


N, 1,078 
N, 120,007 
Ns 122,805-5 
Ns 15,469-25 
whence (1 —r2)!=0-81982, 


r= 0-5726. 
This is appreciably higher than the product-sum correlation (0-57 against 0-52), 
but for the coefficient of mean square contingency I find an almost identical 
value, viz. 0-5736.* 

Clearly, equation (19) is no more to be recommended as a method of 
calculating the correlation coefficient than is the method of mean square 
contingency. But the question naturally arises whether (19) can be used to 
give a coefficient of contingency, say K, defined by the relation 


NN; 1 
— K2)t ——- 2-3, 2 
(1— K?) 7 Me ae (21) 


I am afraid the answer is definitely, No. Let us briefly rehearse the properties 
of K. 

(1) In the first place the maximum value of K is not unity but less than 
unity, except in the limit for indefinitely fine grouping. Consider the sym- 
metrical contingency table in which the distributions of row-totals and column- 
totals are the same, and all the frequencies in the cells of the table lie in the 
diagonal compartments. We then have V,= N3=N3$, and 

12 
a | — ke. 

sab 
Only for indefinitely fine grouping will the concentration k tend to zero, and 
the maximum value of K to unity. The coefficient of mean square contingency 
shares, of course, this property that for practical groupings the maximum value 
attainable is less than unity, but the two coefficients react differently to different 
groupings. 


* Though the stature table is largely used for illustrations in the original memoir (“On the 
theory of contingency, etc.”’, Drapers’ Company Research Memoirs, Biometric Series, 1, 1904) this 
coefficient was not calculated for the ungrouped table: see p. 29 of the memoir. 
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(2) When there is complete independence the coefficient of mean square 
contingency is zero, and conversely when the coefficient is zero there must be 
complete independence. But these reciprocal statements cannot be made for K. 
If there is complete independence K must be zero: but if K be zero it does not 
necessarily follow that there is complete independence. In this respect K 
resembles the coefficient of correlation. In fact, in terms of (20), K=0 only 
implies that the sum of the squares cf the actual frequencies must be equal to 
the sum of the squares of the independence frequencies, and a large number of 
different possible “sets” of “actual” frequencies may fulfil this condition. 
If a “coefficient of contingency” be defined as a coefficient the zero value of 
which implies that there is complete independence, evidently this property is 
fatal to the claims of K. There is no difficulty in constructing simple illustrative 
examples, but in view of the third objection it is scarcely worth while spending 
space on the point. 


(3) There is no algebraical necessity for the right-hand side of (21), in the 
general case, to be less than unity: consequently K can become imaginary, as 
actually happens for the following table, the figures of which are taken from 
Biometrika, 20A (1928), 120, Table (9). 


Table giving K imaginary 


l 2 3 Total 

1 27 54 68 149 
2 9 22 35 
3 53 62 87 } 202 

Total 84 125 177 386 


Here NV, = 386, N,= 64,230, Nj = 54,010, Nj = 23,072, giving 
(1— K?)'=1-0091.... 

With regret one must conclude that though (21) would give a coefficient 
much more easily calculated than the coefficient of mean square contingency, 
and one for which a definite standard error could comparatively readily be 
determined, and though for the great majority of tables the coefficient would 
behave in quite a normal way, the properties (2) and (3) are definitely fatal. 
Consequently equation (18) remains just an interesting, and so far as I know a 
novel, property of the normal distribution. 











——— 


A HISTORICAL NOTE ON KARL PEARSON’S DEDUCTION 
OF THE MOMENTS OF THE BINOMIAL 


By J. NEYMAN 


READERS of statistical journals can hardly fail to have noticed what appears to 
be a regular campaign carried on by Prof. R. A. Fisher to discredit the work of 
the late Karl Pearson. Owing to the tone and form of these depreciations, one 
feels reluctant to reply, but it seems to be useful to point oui just one instance 
illustrating the methods used by Prof. Fisher. 

Commenting on the achievements of the late W. F. Sheppard, Prof. Fisher 

(19376, pp. 10-11 note) recently wrote: 

“The confusion of thought into which his [Sheppard’s] work brought order is 
well illustrated throughout Pearson’s (1895) paper. Thus to the formula 


pe = Clnee+é,) °C ee (1) 
Pearson adds the footnote: 


‘This result seems of considerable importance, and I do not believe it has yet 
been noticed. It gives the root mean square error for any binomial distribution, 
and we see that for most practical purpcses it is identical with the value hitherto 
deduced as an approximate result, by assuming the binomial to be approximately 
a normal curve.’ 

It is extraordinary that Pearson should ever have doubted the exactitude 
of the formula \npq for the root mean square of the binomial, or should have 
thought that its derivation need contain any reference to the normal curve. As 
€,, in his formula, is indeterminate, it is also difficult to see what objective pro- 
blem he thought his formula to solve. The use of an indefinite symbol suggests 
that the value } obtained for trapezia, was felt to be only conventional. We owe 
much to Sheppard for obviating the worst consequences of confusions of this 
kind. (R.A. F.)” 

Owing to the authority which Prof. Fisher enjoys, I at first accepted his 
criticisms, but having an easy access to Pearson’s paper under discussion and, as 
I could not find in my recollections from reading it many years ago anything so 
surprising as produced by Prof. Fisher, I decided to look at it again. Volumes 
of Philosophical Transactions of 1895 are not always readily accessible and 
probably not many of Fisher’s readers will have had the occasion of comparing 
his quotation and comment with Pearson’s actual text. Here is what I have 
found. 

The footnote quoted by Prof. Fisher is attached to a section discussing the 
moments of the binomial, namely to the following passage: 











12 Moments of the Binomial 
““ Hence we may write: 
c?(mpq + €,) 


apt =o a eee (2) 
tg = C(€_-+ npqléy+ 3(n— 2) pq) 


a 
= 

tw 
Il 


= 
3 
| 


we have 


For trapezia: 6&=}, G&=-*% &=2 
For rectangles: €.=ye, Co=ve, €&=1°5} ...... (3) 
For loaded ordinates: ¢€,=0, ¢€,=0, €,=1” 


If the reader is accustomed to the modern terminology, then he will be puzzled: 
he knows of only one binomial distribution ascribing to any point with the 
abscissa x, = a+ck, k=0, 1, 2, ... the probabilities 


n! 
(t,) = en io Uh rs A hi Te tae 
(7x) k'(n —pEi? 4 . (4) 


where c, p and q are positive with p+q = 1 and ais an arbitrary number. On the 
other hand, the passage quoted suggests at least three kinds of “binomial 
distributions”’, one with trapezia, another with rectangles and still another with 
loaded ordinates. The inevitable conclusion is that the terminology of 1895 must 
have been different from what it is now and that forty odd years ago the con- 
ception of the binomial distribution must have been broader than it is now. 
This conclusion compelled me to look through some more pages of Karl Pearson’s 
memoir. It deals primarily with methods of approximating continuous frequency 
curves by means of some processes involving the calculation of easy formulae. 
One of these formulae considered was the “ point-binomial” or the “binomial 
with loaded ordinates”’. The formula differs from what to-day we call a binomial, 
viz. (4), only by a factor «, representing the area under the continuous curve which 
it is desired to fit. Thus 
! 
P(a;,) = a ae ray Ses gS Cte ee (5) 

Karl Pearson’s idea of fitting a continuous curve with a point binomial deter- 
mining only +1 discrete points, say A», A,,...A, with abscissae x, = a+ck 
and the ordinates (5) was based on the fact that frequently, by a proper choice 
of the constants a, c, n and p, the points A; may be made almost to coincide with 
those lying on the continuous curve. Thus he says (1895, p. 345): 

“It is well known that the points of the point binomial ($+ 4)" coincide very 
closely with the contour of a normal curve when ~ is only moderately large.”’ 

Therefore, to fit a continuous frequency curve with a point-binomial, it is 
necessary : (1) to determine the constants a, c, n and p so that the points A», A), ... 
A,, be as close as possible to the curve, and (2) to connect these points by some 
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lines to form a continuous curve, say L. The nature of the lines which should 
connect the points A; is not specified in the paper, but there are reasons to 
think that Karl Pearson had in mind the drawing of them smoothly with a 
“spline” or by other mechanical means, as was customary in the engineering 
drawing office of which he was then in charge at University College. 

The general term of ‘‘ binomial distribution” was therefore attached, not to 
the formula (4) or (5), but to a continuous smooth line LZ passing through the 
points A», A,, ... A,, determined by (4) or (5). As it was desired that just this curve 
L should be as close as possible to the curve which it was sought to approximate 
and as the method of fitting employed was that of moments, it was natural for 
Karl Pearson to try to calculate the moments of L. As the portions of this line 
between the fixed points Ao, A;, ... A, are unspecified, the calculation of moments 
could not without fuller specification give definite results. Karl Pearson starts 
with deducing the moments of the “binomial with loaded ordinates”’ and then 
proceeds to find out how these moments would change if the » + 1 discrete points 
A,, are connected together (i) by straight lines (this case is called “‘for trapezia’”’) 
and (ii) by building on them rectangles so as to form a histogram. This brings him 
to the formulae reproduced above as (2) and (3) and he presumes that any other 
“binomial,” that is to say any smooth line L passing through the points Ag, Aj, ... 
A,, will have its moments in the form (2) with the e’s having their values not very 
different from those corresponding to the “‘rectangles”’ or “‘trapezia’’. 

It may be worth while to reproduce here Karl Pearson’s deduction of the 
moments of the “‘point-binomial’’, as given by him on pp. 345-7 of the paper 
discussed. 

‘* Consider a series of rectangles on equal base c and whose heights are respec- 
tively the successive terms of the binomial (p+ q)”" x a/c where p+q = 1. Here a 
is clearly the area of the entire system. Choose as origin a point O distant $c from 
the boundary of the first rectangle, on the line of common bases, and let y, be 
the height of the rth rectangle, or 


an(n—1)...(n—r+2) : ; y 
= (r—D! iE So ah a mai (6) 


¥, = 


, a yn 42 
while ¥, = a oceans 
. 


Let us find the values of 


eee oe eee (8) 
where s is any integer, for values of s from 0 to 4. 
It is easy to see that 
dj ad d 
aAt{y,c x (rc)*} = ac® G }( = CS | re 9 
Wre X (re)*} dq \ “ag aq q(p +49) (9) 


where the operation d/dq is repeated s times. 
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The operations indicated can easily be performed by putting g = e” when 
ac’ / d \§ 
a{y,c x (re)*} = — Mu a ey ee 10 
{y.0x (rey) =“ -(7,) feo tery") (10) 
and the successive values can be found by Leibnitz’s theorem. After differentia- 
tion we may put p+gq or p+e” = 1. There results: 


Ly 


~(y,¢) 


2(y,¢ x rc) = ac{l+nq} 


a \ 


X(y,¢ x (re)) = ac*{1 + 3ng + n(n — 1) q?} 
S(y,c x (rc)?) = ac3{1 + Tng + 6n(n — 1) gq? + n(n — 1) (n— 2) q°} 


2(y,c x (re)*) = ac*{1 + ling + 25n(n — 1) g? + 10n(n — 1) (n— 2) gq 





f 9 ¢ 4) 
+n(n—1)(n—2)(n—3)q%} 
Let NG be the vertical through the centroid of the system of rectangles, then 
clearly 
ON = X{y,c x re)/a = cf{1+nq}. 
We shall now proceed to find the first four moments of the system of rectangles 
round GN. If the inertia of each rectangle might be considerzd as concentrated along 
its mid vertical, we should have for the sth moment round NG, writing d = c(1+ nq) 


Ae ce xX ORE. OB eee (13) 
The resulting values are 

Hg = mpgqe* 
Me=npalp—qde = jesse. (14) 

fy = npq(1 + 3(n — 2) pq) c* | 

whence, remembering that p+q = 1, we find that p and q are roots of 

us 3/3 — fy) Ho + 1B 
g2@—24+ (3/2 Ha) fea ean i virion (15) 
4(3/03 — fq) Ha + Byes 
See. 23 — V{2(3 43 — Ma) Ma + Byes} qe (16) 
(33 — Ma) Hat M5 Me 


Thus, when //g, 43, and “4, have been calculated for the frequency curve, the 
elements of the point-binomial are known. These results were given by me in a 
letter to Nature, October 26th, 1893.” 

This passage provides, of course, sufficient evidence that Karl Pearson did not 
have the slightest doubt as to the accuracy of the formula \npq as applied to the 
point-binomial and of similar formulae for the higher moments. It is seen also 
that they have been derived without any reference to the normal curve. 





— 
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Before concluding, it may be useful to recall opinions on Karl Pearson’s work 
expressed by Fisher only a few years ago. In his paper of 1922 he wrote (p. 314): 

‘We must confine ourselves to those forms [of distributions] which we know 
how to handle, or for which any tables which may be necessary have been 
constructed. More or less elaborate forms will be suitable according to the 
volume of the data. Evidently these are considerations the nature of which may 
change greatly during the work of a single generation. We may instance the 
development by Pearson of a very extensive system of skew curves, the elabora- 
tion of a method of calculating their parameters, and the preparation of the 
necessary tables, a body of work which has enormously extended the power of 
modern statistical practice, and which has been, by pertinacity and inspiration 
alike, practically the work of a single man. Nor is the introduction of the 
Pearsonian system of frequency curves the only contribution which their author 
has made to the solution of problems of specification : of even greater importance 
is the introduction of an objective criterion of goodness of fit.” 

And again, p. 321: 

‘Perhaps the most extended use of the criterion of consistency has been 
developed by Pearson in the ‘Method of Moments’. In this method, which is 
without question of great practical utility, different forms of frequency curves 
are fitted by calculating as many moments of the sample as there are parameters 
to be evaluated.” 

These paragraphs were published in 1922. But tempore mutantur. Karl 
Pearson is no more and R. A. Fisher’s attitude towards his work becomes that 
of a self-confident and severe judge. Now, the Method of Moments is for 
Professor Fisher (1937a) far from being of “‘great practical utility” and the 
system of Pearson curves is not worth teaching in statistical departments. 
Seeing that the roots of Professor Fisher’s work can be easily traced to Karl 
Pearson, with whose achievements he must have been always well acquainted, 
this instability of opinion is worth noting. 


The speed with which mathematical statistics has developed since the first 
work of Karl Pearson is a remarkable feature in the history of science, and I have 
no doubt that, in time, it will be carefully studied and the role played in it by 
particular workers objectively determined. This is for the future; but at the 
present a protest may be made when a critic overlooks whole passages in the 
main text of a paper and misinterprets footnotes which are clear on reference to 
the text on the same page. 
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GROWTH-RATE DETERMINATIONS IN NUTRITION 
STUDIES WITH THE BACON PIG, AND 
THEIR ANALYSIS 


By JOHN WISHART 
School of Agriculture, Cambridge 


INTRODUCTION 


Some time ago the writer had occasion to analyse statistically the results of a 
nutrition experiment, in which three different levels of protein content in the food 
ration were tested on three groups of bacon pigs, which were under experiment 
from shortly after weaning until they left for the bacon factory at 200 lb. live weight. 
For full details the paper presenting the results may be consulted (Woodman ez al., 
1936). In accordance with standard experimental practice the variable examined 
was the live-weight gain, i.e. the difference, in lb., between the initial and final 
weights of each pig over the 16-week period of the experiment. The resultsshowed an 
average drop of about 44 lb. in live-weight gain from treatment A (ranging from 17-5 
to 12-2 °/, crude protein as the experiment proceeded) to treatment B (22-1-16-9 % 
crude protein), and a further drop of about 41b. from treatment B to treatment C 
(26-8—21-7 °, crude protein). On the analysis of variance test, however, the estimate 
of treatment variance, with two degrees of freedom, was not significant, and even 
the “‘ principal effect’ of the drop from A to C, withasingle degree of freedom, was 
not significant when isolated. The mean square for this principal effect was obtained 
directly from the difference of the totals, or means, for A and C, since the increase 
in protein percentage was linear. It was not that the standard error was ab- 
normally high; in fact the experiment was a very accurate one, the percentage 
drop in live-weight gain from A to C being 5-7, with a standard error of 2-9; but 
the effect was small. Considering that the variation in the initial weights of the 
pigs (who were all of the same age) may have contributed somewhat to the 
experimental error, this concomitant variable was taken account of by an analysis 
of co-variance. The result was that the principal effect could now be adjudged 
significant at the 5% level (2 =0-:8561 with n,=1, n.= 21). Slender as the effect 
was, it was considered worthy of note, the percentage drop from A to C for 
animals adjusted to have the same initial weight being 5-6, with a standard error 
of 2-4. 

The animals were carefully weighed every week, and yet in the investigation, the 
results of which have been briefly summarized, no account was taken of the figures 
other than the initial and final ones, though intervening weight measurements 
were used, naturally, in fixing the amount of meal to be fed. This raises the point 
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of whether more accurate information may not be gained if the other fifteen 
observations are brought into the picture. It is not wholly a statistical point, for 
doubts have been raised in the experimenter’s mind whether the figures of live- 
weight gain are an adequate summary of the information available. For this 
reason an investigation has been made of the seventeen consecutive weight 
measurements, at weekly intervals, of the thirty individual pigs constituting the 


experiment, and the purpose of the present paper is to give the results of this 
study. 


DETERMINATION OF GROWTH CURVE 


The weight figures, to the nearest lb., were first plotted on ordinary, and then on 
logarithmic paper, from week 0 to week 16. This was the period for which observa- 
tions are available for all pigs. Not all were withdrawn at the end of this time, 
as some had not reached 2001Ib. It was, however, thought desirable to keep the 
time period the same in all cases. The weight curve was very regular, and showed 
an upward curvature; since the logarithms showed no sign of linearity, but a 
definite downward curvature, it was thought best to do the curve fitting on the 
actual weights. The next stage was to fit a parabola of the second degree to the 
figures for each pig. As a sample of the calculations involved, the figures for one 
pig are given (Table I), the method of orthogonal polynomial fitting followed 
being that of Aitken (1933). The number of the week is represented by x, and the 
weight (in lb.) by w. In Table I the calculations have been carried as far as needed 
for the fitting of a (second degree) parabola. The three columns following w are 
obtained by continuous summation from the bottom upwards, stopping at the 
result underlined. These underlined numbers are then carried to a column on the 
left of the coefficients enclosed in a square. These latter are readily calculated, as 
shown by Aitken, but they are taken from his table for x = 17, as also the quanti- 
ties X(T?) at the foot. The coefficients a), a, and a, are now obtained by cross- 
multiplying the numbers on the left by the coefficients of each column, row by 
row, summing and dividing by the number at the foot of the column. It is desirable 
to note down the numerator before division, because it will be needed later. We 
are now ready to write down the equation to the parabola. Utilizing the coefficients 
within the square, it is, in the factorial form appropriate to Aitken’s method, 


: 62 (%— 1) 
W =a,+a, (2a- 16) +as) e 3 —45x%+ 120| 
62x xz-—] 
= 118:12 4 4-972 (22— 16) + 00664) = S - 1) — 452+ 120), 


The equation may, however, be transformed in any way we please. In Fisher’s 
(1936) form, writing #, g and h for Fisher’s constants A, B and C, to avoid con- 
fusion with our food treatments, the equation is 


W =@+ g (x—2%)+h{(w—%)?—(n?— 1)/13}, 


Biometrika xxx 
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TABLE I 














2 w W 
| 
0 48 2008 — —- 46-5 
1 54 1960 20121 — 53-5 
2 60 1906 18161 111520 60-8 
3 | 67 1846 16255 | 93359 68-6 
4 76 1779 14409 77104 76:8 
5 86 1703 12630 62695 85-3 
6 94 1617 10927 50065 94-2 
a 104 1523 9310 39138 103-6 
8 112 1419 7787 29828 113-3 
9 124 1307 6368 22041 123-5 
10 134 1183 5061 15673 134-0 
1] 144 1049 3878 10612 145-0 
12 158 905 2829 6734 156-3 
13 170 747 1924 3905 168-0 
14 181 577 1177 1981 180-2 
15 192 396 600 804 192-7 
16 204 204 204 204 205-6 
= (w— W)?=20-55 
A a, a W, 
118-12 4-972 0-0664 
2008 ] — 16 120 46-54 

20121 2 —45 

111520 6 

= (T,?) 17 1632 69768 205-64 = Wy, 
a,xiwl'; 277850-00 D.F. Sum of squares 
2008/17 = 237180-24 16 40669-76 
81142/1632 = 40341-30 15 328-46 
46357/69768 = 307-92 20-54 


Standard error= 1-211 or 1-025°% 


W =118-12+ 4-972 (22— 16) +0-0664 


= 118-12 +9-944 (a—8)+0-1993 {(~—8)*— 24} 


6a (a—1) 
1.2 


fe 


—45x+ aes | 


Nutrition Studies 


AW, AW, 


6-956 
0-3984 


Mean square 


1-4671 


| 
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where %= 8, the mean week, and » is 17. Thus 
W =118-12 + 9-944 (a — 8) + 0-1993 {(a — 8)?— 24}. 


The interpretation of g and h is that the former is the average growth rate in lb. 
per week, while the latter is just half the rate of change of the growth rate in Ib. 
per week per week. @(=ap) is, of course, the average weight of the pig in lb. 
throughout the period. 

These calculations were carried out for each pig, but in the case of the one 
illustrated the additional calculations show what must be done (1) to find the 
polynomial values W corresponding to the observed w, and (2) to test the signi- 
ficance of the linear and parabolic terms. The calculations are for the most part 
self-explanatory, but it should be explained that Wo, the value for z=0, AW, and 
A?W,, the first and second differences at this point, are obtained by cross- 
multiplying ao, a, ari a, by the coefficients of the rows in the square, and sum- 
ming. W,, is obtained as e. check in a similar manner to Wo, but with alternate 
positive and negative signs in the summation. The sum of squares of residuals, 
x (w— W)?, furnishes a check on the calculations at the bottom of the table, 
wherein = (w— W)* is obtained by repeated subtraction of terms dependent on the 
mean and the linear and parabolic terms. The number in the numerator here is the 
numerator of a already referred to. That the linear and parabolic terms are both 
significant is demonstrated by separate comparisons of the quantities 40341-30 
and 307-92, with one degree of freedom each, with the residual mean square 
1-4671, having 14 degrees of freedom, by means of the z-test. As a matter of fact, 
the cubic term is also significant, but not the quartic term, as a continuation of 
the calculations will show. For present purposes, however, attention has been 
concentrated, in the agricultural problem, on g and hf only. 


ANALYSIS OF AVERAGE GROWTH RATE 


The figures which now require to be analysed are given in Table II, together 
with the initial weight w,, in the order in which the pigs were arranged for 
feeding, and the sex and feeding treatment are indicated by letters. (H = hog, 
G=gilt; A=low protein, B=medium protein, C=high protein.) Each pen 
contained six animals, three hogs and three gilts, all from the same litter. These 
were allowed to run together, but at feeding time they were segregated into boxes, 
in the order shown, for individual feeding. The five pens represent five different 
litters. A further complication of design is that in each pen three animals were 
selected from the litter which were above average weight at weaning (heavy), 
while the other three were below average (light). The group of three, to which the 
three feeding treatments were given, were either all hogs or all gilts, with the 
exception of pen IV, where the allocation was as indicated in Table IT. Since there 


2-2 
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are five pens, the heavy and light lots are not evened up as regards sex, and this 
may introduce complications in any sex comparison which may be made. It was 
not the primary object of the experiment to make these sex comparisons, but 
evidently any information which can be learnt on this point will be of general 
interest. 

TABLE II 


Data for analysis 


| | 
ie Treatment Sex We g h 

| 
| A G 48 9-94 0-199 
Heavy B G 48 10-00 0-146 
, lc G 48 9-75 0-136 
| C H 48 9-11 0-139 
Light {B H 39 8-51 0-154 
A H 38 9-52 0-209 
B G 32 9-24 0-147 
Light {° G 28 8-66 0-181 
ry A G 32 9-48 0-194 
a | Cc H 37 8-50 0-144 
Heavy 1! H 35 8-21 0-119 
B H 38 9-95 0-178 
C G 33 7-63 0-176 
‘Light 14 G 35 Q-32 0-176 
Ill B G 41 9-34 0-182 
| B H 16 8:43 0-171 
Heavy C H 42 8-90 0-155 
A H 41 9-32 0-176 
S G 0 10-37 0-207 
| Heavy fr H 48 10-56 0-126 
| (A G 46 10-98 0-193 
Light B H 40 8-86 0-157 
lc H 42 9-5] 0-130 
B G 37 9-67 0-192 
(tent {A G 32 8-82 0-199 
V ! C G 30 8-57 0-189 
| B H 40 9-20 0-192 
| Heavy C H 40 8-76 0-177 
lA H 3 10-42 0-200 


W,)= initial weight at week 0 in lb. 

g=average growth rate in lb. per week. 

h=} (rate of change of growtl: rate in lb. per week per week). 
From the analysis of variance point of view we can evidently eliminate pen 
differences as irrelevant to the food comparisons we wish to make. We then have 
the choice between assembling the data in food and heavy-light groups, studying 
separately the effect of food, the difference between heavy and light, and the 
interaction between these factors, or arranging the data in food and sex groups, 
with a similar analysis. Evidently the sex comparison is confounded with the 
heavy-light comparison in a way that makes it impossible to say that any sex 
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difference found is independent of heavy-light differences and vice versa. Since, 
however, we may make use of the initial weights in an analysis of co-variance to 
examine the effects existing in groups of animals with the same initial weight, this 
complication need not disturb us. We may therefore analyse the data for any 
particular variable into a pen or litter effect (4 degrees of freedom), a food effect 
(2 degrees of freedom), a sex comparison (1 degree of freedom), the interaction of 
food and sex effects (2 degrees of freedom), and the residual or error (20 degrees of 
freedom). For the growth rate g the analysis is as follows: 


TABLE IT 


Analysis of variance of growth rate 


Variation due to D.F. Sum of squares Mean square 
Pens 4 4-8518 1-2130 2z=0-5754 S 
Food 2 2-2686 1:1343 z=0-5419 NS 
Sex 1 0-4344 0-4344 
Interaction 2 0-4761 0-2380 
Error 20 8-3144 0-4157 
Total 29 16-3453 
Standard error per pig = 1/()-4157) = 0-6194, or 6-66°, of the mean growth 


rate per pig (9-304 lb. per week). 


Comparison of the mean squares for the various effects with that for error is 
made by the z-test. S denotes significance at the 5 °% ievel, and NS denotes “not 
significant’’. The differences between the average growth rates for the different 
pens are significant, which shows that different litters have grown at different 
rates, though this may quite well be due to the fact that their initial weights were 
very different. The differences between the average growth rates for the food 
treatments A, B and C are not significant on the 2 degrees of freedom, but if, as 
in the previous study, we take out the single degree of freedom for the “ principal 
effect’, the sum of squares for which is obtained as one-twentieth of the square 
of the difference between the A and C totals, which are 96-49 and 89-76 respectively, 
we get the following result: 


TABLE IT a 


| cs aed | 
> ah Mean square | 
| Principal effect l 2-2646 z=0-8876 S 
Rest 1 0-0040 
Error 20 4157 
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In doing this we are taking account of the additional fact that the three food 
totals, whose variation is given in Table ITI, are arranged in order of increasing 
protein percentage in the food. The significance now obtained, combined with the 
non-significance of the “rest”, shows that there has been on the average a linear 
decrease in the growth rate with increasing protein percentage in the ration, and 


the result of the experiment may be summarized so far in the following table of 
average results: 


TABLE IV 


Summary of resulis—Growth raie 


Food treatment A B C Mean Standard 
error 
Lb. per week 9-649 9-288 8-976 9-304 0-1959 
o% 103-7 99-8 96-5 100-0 2-11 


The growth rate has dropped from A to C by 7-2 %, a figure which hasa standard 
error of 3-0. 

The interesting thing is that while the accuracy, as represented by the figure 
of the standard error per pig, is of the same order as that of the live-weight gains 
of the previous study (6-66 ° as compared with 6-51), we have been able to detect 
a significant effect with the growth rate measurements which was only demon- 
strated with the live-weight gain figures after account had been taken of initial 
weights by covariance. This leads us to examine whether the food effect may not 
be more firmly established in its significance by taking account in the present 
study of the initial weights, since it may possibly have been considered that we 
were straining the analysis somewhat in the previous study to demonstrate the 
significance of what was after all only a small effect, though, of course, it has 
practical consequences of importance. 


ANALYSIS OF COVARIANCE 


The analysis of covariance procedure should by now be reasonably familiar 
to everyone, so that no apology is needed for presenting the results in a series of 
tables without lengthy explanation. Reference may be made to Wishart & 
Sanders (1935) by those requiring such explanation. Analysing the sums of squares 
and products of the figures for initial weight (wy) and growth rate (g) in Table IT, 
along the lines of Table ITI, we have the following table: 
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TABLE V 
Analysis of variance and covariance. Initial weight 
and growth rate 





Variation due to D.F. (wW*) (weg) (g*) | b=(weg)/(wo?) (wgg)?/( 297) | 

ere ets Pa palew 

i 

Pens + 605-87 39-905 4-8518 

Food 2 5-40 —0-147 2-2686 

Sex l 32-03 | —3-730 | 0-4344 

Interaction 2 22-47 3-112 0-4761 

Error 20 442-93 39-367 8-3144 0-08888 3-4989 

Total 29 1108-70 78-507 16-3453 

Food + error 22 448-33 39-220 10-5830 3-4310 
Sex + error 21 474-96 35-637 8-7488 - 2-6739 


That the regression of growth rate on initial weight is significant is shown by 
the following test: 
TABLE Va 
T'est of regression 


Variation due to D.F. Sum of squares Mean square 
Regression l 3-4989 34989 z=1-3126 SS 
Deviations 19 4-8155 0-2534 

Total 20 8-3144 


SS =significant at 1 °% level. 
Standard error per pig (residual) = 0-5034 or 5-41 ‘ 


To test for the food effect, corrected for initial weight, we proceed as follows, 
by a process of subtraction of entries in the last column of Table V from the 
corresponding entries in the (g?) column: 

TABLE VB 
Analysis of residual variance—F ood 


} 


D.F. Sum of squares Mean square 
Food + error 21 7-1520 
Error 19 4-8155 0-2534 
Difference 2 2-3365 1-1682 z=0-7641 S 
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We are thus able to assert, with greater confidence than before, since it is not 
necessary to split up the 2 degrees of freedom, that the food effect is significant. 
Correcting the three mean growth rates for variable initial weight by subtracting 
b (wy — po), where b is the regression coefficient calculated from the error term 
of Table V, while wy is now the mean initial weight for treatments A, B or C, 
®, being the general mean, we may summarise the results as follows: 


TABLE VI 


Summary of resulis—Corrected growth rate 


‘“ Standard 
Food treatment A B Cc Mean — 

error 
Lb. per week 9-676 9-235 9-003 9-304 0-1592 
% 104-0 99-3 96-8 | 100-0 1-71 


The percentage drop from A to C is unaltered (for in fact the initial weights for 
these two groups were the same), but the standard error of the three figures is 
reduced from 2-11 to 1-71, which accounts for the greater significance of the 
fall. 

It will be noted that the gilts are lighter in initial weight than the hogs, but 
have the higher growth rate, though neither effect is significant. In view, however, 
of the positive correlation between growth rate and initial weight (from Table V 
b is 0-0889, corresponding to an r of 0-649), it is of interest to examine whether a 
significant sex difference emerges after correction for initial weight. The test is as 
follows, utilizing Table V: 


TABLE Vc 


Analysis of residual variance—Sex 


D.F. Sum of squares Mean square 
Sex + error 20 6-0749 
| Error 19 4-8155 0-2534 
Difference l 1-2594 1-2594 z=0-8017 S 


The sex effect is now significant at the 5 °% level, and correcting for initial 


weight the mean growth rates for hogs and gilts separately we have the following 
result: 





i 








———— 








JOHN WISHART 25 


TABLE VIa 


Summary of results—Corrected growth rate 


Sex Hogs Gilts Mean Standard error 
Lb. per week 9-092 9-517 9-304 0-1300 
% 97-7 102-3 100-0 1-40 


The difference between the growth rates for hogs and gilts is 0-425 lb. per week 
in favour of the gilts, which difference has a standard error of 0-1903, calculated 
as the square root of 

2 206? 

0-2534(—— + ‘ 

15 442-93, 
(2534 is the error mean square residual, while we are examining the difference 
between two means of fifteen pigs each. 2-06 is the mean difference in initial weight 
between hogs and gilts, while 442-93 is the error sum of squares for initial weight 
from Table V.* On a percentage basis, the difference between hogs and gilts is 
4-6, with a standard error of 2:05. The experiment was not specifically designed to 
examine sex differences in the growth of the pigs, nor is it known how far such 
differences in growth rate during what is after all only the early part of the normal 
pig’s life (though it is the whole of the life of the pig destined for the bacon 
factory) are matters of common knowledge. Nevertheless there seems to be little 
doubt about the effect in the present case. 

If pen differences are examined in the same way, it will be found that the 
residual mean square, after correcting for initial weight, is not significant 
(2 =0-4225, n,=4, ny=19). This confirms the view that the significant pen 
differences in growth rate are a consequence of the very different average initial 
weights at which the different litters entered the experiment, and there is no 
evidence that rate of growth is a litter characteristic of any particular significance. 


ANALYSIS OF RATE OF CHANGE OF GROWTH RATE 
A similar analysis to that of growth rate may be carried out on the parabolic 
term / of the curve fitted to the weight measures. The analysis of variance is 
shown in Table VII. It is clear, on examination of this table, that the only 


significant effect is that of sex. This is shown in Table VIII. 


* See Wishart and Sanders (1935, p. 54). 
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TABLE VII 
Analysis of variance of rate of change of growth rate 





Variation due to D.F. Sum of squares | Mean square 
Pens 4 0-0034835 0-0008709 
Food 2 0-0012578 0-0006289 
Sex 1 0-0030603 0-0030603 <z=0-8060 S 
Interaction 2 0-0008078 0-0004039 
Error 20 0-0122093 0-0006105 
Total 29 0-0208187 


Standard error per pig = +/(0-0006105) = 0 02686, or 15-63°, of the mean, 0-1719. 


TABLE VIII 


Summary of results—Rate of change of growth rate 


Sex Hogs Gilts Mean Standard error 
4 {lb./(week)?} 0-1618 0-1820 0-1719 0-00638 
of 94-1] 105-9 100-0 3-71 


Not only have the gilts shown a higher average growth rate than the hogs 


corrected for initial weight), but they now show a higher rate of change of 
(when corrected for initial weight), but they I higher rate of change of 


growth rate, i.e. there is a greater degree of curvature in the growth figures. The 
difference in favour of the gilts is 11-8 °%, with a standard error of 5-25. 
Finally we may examine the rate of change figures in relation to initial weight. 
The table is as follows: 
TABLE IX 
Analysis of variance and covariance. Initial weight and 


rate of change of growth rate 





Variation due to | D.F. (wo") (woh) (h?) b=(woh)/(we")|  (woh)*®/(we?) 
Pens 4 605-87 | —0-18386 0-0034835 
Food 2 5-40 0-0117 0-0012578 
Sex l 32:03 | —0-3131 0-0030603 
Interaction 2 22-47 | —0-1293 0-0008078 
Error 20 442-93 0-10186 0-0122093 0-00023 0-0000234 


Total 


29 1108-70 


—0-5127 


0-0208187 


Ty», =0-0438 NS 
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That the regression, however, of rate of change of growth rate on initial weight 
is not significant is shown by the following test: 


TABLE IX a 


Test of regression 


Variation due to D.F. Sum of squares Mean square 
| 
Regression 1 0-0000234 0-0000234 NS 
Deviations 19 0-0121859 0-0006414 


This being so, we are not likely to add to the information already obtained by 
examining the various effects when corrected for initial weight. No improvement, 
for example, is shown in the significance of the sex comparison. The downward 
trend shown in the figures of rate of change of growth rate with increasing protein 
percentage in the ration, while suggestive of what may be happening, is definitely 
not significant, even if the principal effect, with 1 degree of freedom, be isolated 
from the remainder. 


DISCUSSION 


By considering the actual weekly figures of the weights of the thirty pigs given 
over to this nutrition experiment, we have been able to demonstrate the signi- 
ficance of the fall in average growth rate with increasing protein percentage in the 
ration, and the sex difference in favour of the gilts in the rate of change of the 
growth rate, without making any allowance for initial weight. This contrasts with 
the previous study where only live-weight gain was considered. When the 
figures of mean growth rate are corrected for initial weight, the significance of the 
food effect is stronger, and a sex difference in favour of the gilts emerges as 
significant. Not only is the taking into consideration of the initial weights valuable 
from the point of view of reaching such conclusions, but it seems to be necessary 
to do so if we are to disentangle the sex comparison from the heavy-light compari- 
son with which it is to some extent confounded by the design adopted for the 
experiment. 

Were the decisions reached by separate examination of the growth rate and 
change of growth rate figures not so clear-cut, it might be necessary to take these 
figures (g and A of Table IT) together in a simultaneous analysis of variance and 
covariance, and reach a single test of significance of the effect of food (or of sex) on 
both simultaneously, after the manner suggested by Bartlett (1934). The method 
outlined in this paper of calculating a number of quantities to express the growth 
of the pigs would seem, in fact, to be well adapted to this method of analysis, 
since we are seeking the effect of the food ration on growth, which is expressed by 
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both of the variables g and h (and possibly by the cubic term as well). Not only 
so, but the fact that it is desirable to take initial weights into account suggests 
that Bartlett’s method should be applied to the partial variables derived from g 
and h when wy, is held constant, and a test of significance derived in the same sort 
of way as in the usual covariance analysis. We have, in fact, a case of multiple 
dependent variables, with one independent variable, a special case of the kind 
envisaged by Day & Fisher (1937). This point is not pursued in the present 
paper, but is commended to the attention of investigators. 
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[. INTRODUCTION 

OnE of the important problems in agricultural science is the breeding and 
selection of new families or varieties which, for some economic reasons, are better 
than those already known. The desired properties of the plants are usually very 
complex and include a combination of various characters, yielding capacity, 
resistance to diseases, etc. However, to simplify the problem, we shall assume 
below that there is just one single character in plants, the importance of which 
is overwhelming and which it is desired to better by breeding new varieties. 

The process of breeding new varieties depends on various circumstances, such 
as whether the plant under consideration is self-fertilizing or not. In the following 
[I shall consider problems arising in the breeding of sugar beet, with a view to 
increasing their sugar content. It seems probable that similar problems are also 
met with in many other cases. It will be useful to call attention to two properties 
of sugar beet: (1) Sugar beet is a cross-fertilizing plant, which makes it practically 
impossible to obtain anything like a pure line. (2) The vegetation period of sugar 
beet covers 2 years. During the first year, a seedling produces a root rich in sugar 
but no seeds. The seeds are produced during the second year of life of the plant, 
when sugar stored in the root is used as foodstuff. 

Before I describe the problem to be dealt with below, it will be useful to give 
some idea of the process of breeding. This is roughly explained in Fig. 1, which, 
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however, omits certain details and devices which are used by particular breeders 
and are not relevant from the point of view of the problems I am going to treat. 

Fig. 1 shows the subdivision of the process of breeding into five steps. First 
certain individual roots A,, A,; B,, B,, etc., are selected, planted in pairs and 
allowed to cross-fertilize. It is hoped that some of their progeny will possess an 




















Crosses of plants chosen out of some old varieties I 
A, X Ap, B, XB, . eG te io oe ee ee C, XC, 
e e : . a a7 
@ Soren igtvidual selection II 
eae 
Poors represent sinfe plants. Dots in circles represent selected pland\ 








® @| jm 


Isolated plots with parent plants of new varieties 
i 


























It ° Y 


y | 1 1 ® 


| 
Plots reproducing and multiplying separately the progenies of single IV 


parent plants, i.e. new varieties 














ia bi ee at 

































































Field trials comparing new varieties with the standard 











Fig. 1. 


increased sugar content. Each of the roots A,, Ag, etc., being a hybrid with respect 
to a great number of genes, their progeny will not be homogeneous but will be a 
mixture of a great number of types of various properties. Therefore a selection 
from among them is needed. The second step consists in planting the seeds 
obtained from the crc sses on a larger field. In the autumn all roots are lifted and 
out of each of them a small section is cut out and analysed for sugar content, which 
does not prevent the root from producing seed if planted again the next season. 
Xoots with small sugar content are discarded and others, promising a sweet 
progeny, selected for further breeding. This step is called individual selection. 
The third step in breeding consists in planting the selected roots in isolated 
plots, so as to prevent, as far as possible, cross-fertilization. Each of these roots 
generates a new family of beet and it will be called the parent plant of this variety. 
The number of seeds it produces is, of course, rather small, and the fourth step, 
taking up at least two years, consists in multiplying the seeds of the new variety. 
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The first progeny of the parent plant is sown on a separate plot and allowed to 
reproduce. 

The fifth and final step consists in the test of the results of all préceding steps: 
all the new varieties are compared in field trials with some established standard. 
Those which are found to exceed the standard in sugar content are further multi- 
plied and put on the market. The others are discarded. 

It is obvious that at all stages described above the breeder is faced with various 
risks of error. 

(1) His choice of roots A,, A,; B,, B,, etc., used for the cross may be unlucky, 
and practically all the genetical types produced may have no advantages over the 
existing standard. This problem, however, lies outside the scope of the present 
paper. 

(2) Even if the cross was a success the breeder may be wrong in his step I 
and fail to select the proper individuals from which to breed the new varieties. 
It must be remembered that the individual variation from plant to plant is very 
large, and it may easily happen that genetically better plants through environ- 
mental conditions will be less promising than some of the worse ones. The obvious 
remedy against overlooking the best genetical types in the process of individual 
selection is to breed from as many individuals as possible. This is actually often 
done, but there is a limit to this device imposed by the difficulty in comparing 
large numbers of new varieties with the standard. 

(3) Even if both the cross and the selection of parent plants were successful, 
the breeder may be unlucky in his field trials. It is known that their accuracy is 
limited, and it may happen that, through the unavoidable experimental error, 
the successfully selected new varieties will be judged inferior to the standard, and 
consequently discarded. In such a case all the previous efforts and expense in 
breeding and selection of the new varieties would be wasted. 

It is obvious that we can avoid this danger by increasing the accuracy of the 
field trials. Here, however, we come into a conflict with (2). An increased accuracy 
of field trials means either an increase of the number of replications or an improve- 
ment in the method, which, in practice, always means additional expense. If 
we increase the number of varieties to be compared with the standard, this means 
another additional expense. So the breeder will ask the question, what is more 
important, to have more new varieties and test them superficially, or fewer 
varieties and test them with a great accuracy? This is the problem which will be 
dealt with in the present paper. 


II. THE GENERAL PROBLEM 
(a) Statement of the general problem 
In order to make clear the general problem, consider some particular varieties 
to be compared with a standard, and denote by X the true excess in sugar 
content (true excess, for short) which one of these is able to give over the standard 
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in some particular conditions of soil, treatment and weather. We can never know 
X, but a field trial may give its estimate, x, which is unavoidably affected by an 
experimental error. If X is greater than zero, the new variety will be considered 
as successfully selected. But X may be greater than zero while its estimate, x, 
owing to the experimental error, may be negative or, even if it is positive, it may 
be so small that the experimenter will doubt whether its excess over zero does 
indicate that X is also positive. 

If the magnitude of X were known and also the accuracy of the field trial, 
then it would be possible to calculate the number of replications which are needed 
to insure that the probability of the trial detecting the fact (in the sense described 
on p. 34 below) that X is greater than zero, will be as large as desired. 

For this purpose it is only necessary to make use of the tables which give the 
probability of second kind errors, in connection with “‘Student’s”’ test (see (1) and 
(2)),* ie. give the probability that an experiment will fail to detect the advantage 
in sugar-yielding capacity of the new variety over the standard when it is as large 
as say, X’. Any seed-breeding station, with an established method of experi- 
mentation, can use the results of previous experiments to estimate roughly the 
standard error per plot to be expected in future, and apply the tables mentioned 
to calculate how many replications should be made in order that the probability 
of detecting such varieties which exceed the established standard by any amount 
X’ is as large as, say, 0-8, 0-9, etc. Using this number of replications, the station 
would feel confident of discovering in 80 or 90 % of trials the varieties exceeding 
the standard by X’. 

This, however, does not solve the problem, because we do not know how 
frequently the new varieties do exceed the standard by the fixed amount X’. 
Fixing X’ arbitrarily in advance we may fix it so large that the new varieties will 
practically never give an excess exceeding X’ and, thus, the further calculations 
will be actually useless. In order to obtain useful results, we must know not only 
how frequently an excess of a given size over the standard will be detected by an 
experiment of a given accuracy, but also, how frequently excesses of all possible 
sizes are actually met with in the usual process of breeding and selection. If we 
know that applying our customary methods we shall usually succeed in selecting 
varieties which exceed the standard by Xj, Xj, X3, ... with frequencies, say 
P,, Ps, Ps, ..., we may then apply the tables of the second kind of errors to each 
of these categories and, thus, see what would be the practical effect of applying 
any fixed number of replications to the new varieties which are usually presenting 
themselves for comparison. 

It follows that for the solution of the problem of the relation between the 
number of new varieties and the accuracy of the field trials, the knowledge of the 
probability that an experiment will detect any specified excess in sugar content 
is not sufficient. To solve the problem we must also know the distribution of these 


* Small figures in brackets refer to literature quoted at the end of the papers. 
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true excesses over the standard variety which the new varieties may show. It is 
obviously impossible to make any sure prediction about this distribution, but we 
may estimate what it has been in the past, and use this estimate to give an idea 
of what may happen in the future. 

The method of estimating the distribution of the true excesses over the 
standard, shown by a number of varieties in a series of experiments already 
carried out, is the first problem to be considered. 


(b) The population of the new varieties 


Consider a series of N experiments with the same design and the same number 
of replications, each comparing the same number, say k, of varieties 


be the estimates of the excesses of the varieties (1) over the standard V, obtained 
in the ith experiment. It will be noticed that there are Nk different varieties com- 
pared with the same standard. Hence, altogether, there will be Nk different 2’s. 
It is usually assumed that within any single experiment the standard error, o;, of 
the estimate, ;;, is the same for all varieties. We shall adopt this hypothesis and 
denote by s? the unbiased estimate of o?. 


Behind these experimental results, there will be true excesses 
fo Dis anaes, CC ae (3) 


of the varieties compared over the standard. These true excesses (3) depend upon 
the varieties chosen for trials, which may be regarded as a random sample 
drawn from a population z described as follows. 

Consider first a population, 7’, of true excesses over the standard which would 
be observed if all the individuals coming from a cross (or, perhaps, crosses) 
performed by the breeder were used as parent plants of new varieties. Denote by 
p’ (X) the distribution function of the X in that population 7’. 

Actually, the breeder makes a selection of the individuals from which he 
intends to breed, and tries to select the best ones. In this, however, he must be 
sometimes wrong, and we may consider a function f(X) representing the 
probability that an individual of the population 7’, capable of generating a variety 
with the true excess over the standard equal to X will be actually selected by the 
breeder. 

The functions p’(X) and f(X) together determine a certain imaginary 
population, the one we have denoted by = of the new varieties which, under the 


Biometrika xxx 3 
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usual conditions of selection, are liable to be compared with the standard. The 
true distribution of X in this population is, say, 


Slalom. KO CANA), «©. av anen (4) 
and the varieties which were actually compared with the standard in a particular 


year may be considered as a random sample from the population =. 
It will be noticed that the population z and the distribution p (X) depend on 

the method by which the parent plants are selected from the population 7’. If, 

for instance, we decide to diminish or to increase the number of the parent plants | 

to be selected, then the distribution p(X) will be changed also. The same will 


happen if the principle on which the parent plants are selected is altered. It 











~ “fe . . r ° | 
follows that if it be possible to estimate p (X), we may learn something about the 
suitability of different alternative methods of selecting parent plants. 
} 
(c) The probability P of detecting a “‘best”’ variety 
| 
We are now interested in the distribution p (X) of the X’s in the population z. 
Once this distribution is known we can see roughly whether any given size X’ of 
the true excess X, is likely to be met with in practice. Suppose that the true 
distribution of X’s is represented in Fig. 2, where the range of X extends from 
t px) 
\ } 
a 
a es « b x 
Fig. 2. 


a to b. It will be seen that it would be useless to aim in our experiments at the 
detection of varieties with X exceeding b. In fact, such varieties will never occur. 


herefore, the progress in plant breeding depends upon the possibility of identi- 


fying those varieties for which X is positive but does not exceed b. In any practical 
case it will be possible for the breeder to fix a certain value c, lying somewhere 
between O and b, such that he would consider to be most desirable to detect new 
varieties with excesses exceeding c. He may, then, adjust the number of replica- 
tions of his trials so as to have a fair chance of detecting such varieties; for 
convenience they may be called the “best”’ varieties. Suppose that we know 
p(X) and that c is fixed; denote by P the probability that a variety with excess 
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exceeding c will be detected in a trial of given accuracy. P is easily seen to be given 
by the formula 
rb 
P=| p(X) B(X)dX, ee 
c 


. 


where B(X) is the probability that the field experiment will detect the fact that 
X>0. 


(d) Kotodziejczyk’s results on the power function of “ Student's”? test 
and their application to calculate P 


The function B (X), called the power function of the statistical test employed, 
is easily obtained from the formula given by Kolodziejezyk (3) and weshall discuss 
it below. 

Since each x;;is an estimate of X ;j itis reasonable to assume that 2;; is normally 
distributed about X,; with a standard error o; whose estimate, which has been 
denoted by s;, is independent of 2,;. The joint probability law of x,; and s; is 

fil fs*i(a—X)? 
p (x,s)= 10-2) raf) oft (2m) ‘ 
where f is the number of degrees of freedom used for estimating o;. 

The statistical method used in analysing the data obtained from field experi- 
ments consists in testing the hypothesis Hy, that X <0, that is to say, that the 
compared variety is not better than the standard. It has been pointed out by 
Neyman and Pearson(4) that in testing Hy, two kinds of errors should be considered: 
the error of rejecting the hypothesis tested when it is true—the first kind of error- 
and the error of failing to detect that some alternative is true—the second kind of 
error. 

Denote by P, the probability of the first kind of error. We may fix in advance 
any number 0 < « < 1 which we shall call the level of significance, and arrange the 
test so that the probability P, will never exceed «. For this purpose it is sufficient 
to make a rule of rejecting the hypothesis tested whenever the ratio t=2/s is 
greater than ¢,, where ¢, is the value to be found in R. A. Fisher’s tables (5) of ¢ 
corresponding to P=2«. Below we shall consider two levels of significance, 
a=005 and «=0-01. 

If H, is not true and X’ > 0 is the true value of the excess X, then the chance 
of the test detecting the fact that X’ is greater than zero is evidently 


fil sé fe pe _(2-X) 
B (X') Id) Ps zs ) si—le *"ds ( ~~ de. 
2s0-® | (3f)o V (27) Jo J sty 
— (7) 
t ou2 ax—X')? 
Let z.=—~_ 32 gas : ‘ee ey ee (8) 
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Substituting these equations into (7) and (5), we get 


rb 
P=| p(X) B(X)dX 


c 


1 Pb r - > . —u?/222 (°c — }2? 
~ \/ (2m) 22 (Ef) Z| p(X)dx | ul—le du | e dz. 


a 0 


In order to evaluate the integral (9) accurately, it would be necessary to know 
the exact nature of the function p (X), and even then the work would probably be 
rather tedious. Since, however, we cannot hope to know the distribution function 
p(X) exactly, the best we can do is to get for it a reasonable approximation, 
using the data of the experiments carried out in previous years. We cannot, 
therefore, obtain an accurate evaluation of P and shall consider an approximate 
method of calculating its value given in (9). This will be done by using the exact 
values of B(X) obtained from the tables referred to(1), and the estimated values 
of p(X). We shall then apply the simplest quadrature formula, 


h m—1 
P=>5 (YorYm) +h 2 Yi» 6 a (10) 
2 < 
where Yo, Yj; +++; Ym are the values of the product p (X) B(X), calculated at a series 


of points at equal distance h. The results which it is possible to obtain, using this 
quadrature formula, will be sufficiently accurate for practical purposes. The 
approximate information which we may have regarding the function p (X) would 
not justify the application of any more elaborate method of quadrature. 

It is now clear that the knowledge of the function p (X) is essential from the 
point of view of the problem we are interested in and we shall consider how it 
could be estimated from the results of previous experiments. 


III. METHODS OF ESTIMATING THE DISTRIBUTION OF X’s 


(a) Estimation of p(X) by method of moments 


s . p=, 2, ...,N 
Write j= Xj t E75; CL a oe 3S Aan a aii (11) 


where ¢;; is a random error, which will be assumed to be independent of the X’s 
and normally distributed about zero with the standard deviation o;. Denote by 
& (u) the expected value of any random variable, u. It is known that 


Let m, and M; be the qth moments of x,; and X,; respectively. From (11) and (12) 


! 


, ( r ( = q: , 9° ‘ 
m,=6 (x4,)=E[(X,+¢, = > f M' o_o OF (13) 


0< i<hq 2't! i¢— 2t)! 
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Putting qg=1, 2, 3 and 4, we get 

m,= M3, 

ms = M3 + 03, 

‘ Be 

m; = M3 +3M; 0%, igs 

m, = M+ 6Mj 02+ 3c}. 
The left-hand sides of (14) represent the moments, about zero, of the observable 
variate x;; and may be estimated from the experimental data. Solving (14) with 
regard to M{ and calculating the central moments M, of X;; interms of of and the 
central moments m, of x;;, we obtain 


M\=™,, } 

M,= Ms — oF, | (15) 
ee 5 

M.= Me, 





M ,=m,— 7 (6m, — 303). 


Strictly speaking equations (15) refer to a particular experiment corresponding 
to the subscript 7. If, however, the accuracy of all experiments is the same and 
thus o, =0,=... =o, =a, then, we can apply the same formulae to all experimental 
data available. It will be seen below that the assumption that all o’s are equal may 
be sometimes reasonable and, therefore, we shall adopt here this hypothesis. If 
it is true that o,=0,=...=o0,=o and the number of degrees of freedom for 
obtaining the common estimate of o is sufficiently large, we may replace o* by 
s2, the common unbiased estimate of o?. Hence we can estimate the moments of 
X,,; from the observed data in N experiments. Having obtained the moments of 
X;;, we can calculate #,(X) and #, (X) and determine the corresponding distri- 
bution from the Pearson system of curves (6 


Any method of estimation should be tested to see how far it will give reliable 
results. Especially we want to have some idea as to how accurately we are likely 
toestimate p(X), using only a limited number of observations. A special theoretical 
inquiry will be needed to study the efficiency of the method described. Until such 
work is completed it was thought useful to test the method empirically and two 
artificial examples were worked out. However, before proceeding to these 
examples, I shall describe an alternative method of estimating p(X) due to 
Eddington and described by Levy and Roth (7). 


(b) Alternative method of estimating p(X) 


At this stage it will be convenient to alter a little the notation concerning the 
probability laws. Denote by u(«) the probability law of x, the estimate of X; 
p(X), as before, will represent the probability law of the true excesses; « will 
denote the difference between x and X and p(X, «) the simultaneous probability 
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law of X and «. We have assumed that the experimental error « is independent 
of X and normally distributed about zero with standard deviation o, the value 


of which is assumed to be possible to estimate accurately from a large number of 
experiments. It follows that 


—e?/20* 
p(X,e)=p(x)— > e pth en Page (16) 
V (27) o 
where o may be considered as known. 

Introduce now a new system of variables 

X=2x+n, 
eS i eer te (17) 

«== y | 


the simultaneous probability law of x and y will be found as 


p(x+n)e 
oC 


9? /20° 


a 


Pp (x,n)= 7, 
V (27) 


In order to obtain the probability law of 2, we have to integrate this expression 
with regard to 7 
1 ° x y?/20° 
u (x)= nag p(x+n)e me :: 2 oeliee (19) 
V(e7) oJ —« 

This corresponds exactly to the first formula in Levy’s book(7), p. 157. On the 
next page he gives an expansion which makes it possible to calculate the value 
of p (X) in terms of the values of u (x) and its successive differences, viz. 


9 
oO” 


, ne : 2 . ial 2 
p(X)=u(x)—- 5 A*u (x) + 5 Aku (x) — 9 7.7 4 ) Atw (w)+.... ...(20) 


This method has been tried as an alternative in the examples discussed below. 


(c) Empirical test of the two methods of estimating p(X) 


30th examples which we shall describe below consist in assuming arbitrary 
distributions p (x) and in obtaining by laboratory methods a set of figures which 
could be obtained as experimental data if the assumed hypothesis and the 
distributions p (X) were in fact true. I started in the two cases with the assump- 
tion that the true distributions, p(X), were represented by the histograms shown 
jn Figs. 3 and 4. In order to obtain the z’s, it was necessary to add to each value 
of X a random error ¢, independent of X and normally distributed about zero. 
These were obtained from the tables of normal deviates published by Maha- 
lanobis (8). These deviates represent what would be the observed values of a 
random variable, «, normally distributed about zero with its standard deviation 
equal to unity. 
Adding normal deviates to the values of X, I have obtained 100 numbers and 
these were then considered as the values of the 2’s and were used to estimate p (X) 
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Fig. 3. True distribution of X and its estimates. 

































































Fig. 4. True distribution of X and its estimates. 
Fitted distribution of x. 
(i) Fitted distribution of X (by method of moments). 
(ii) Fitted distribution of X (by Levy’s formula). 
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by the two methods described. These methods, however, need also the estimate 
of o*. In order to have a situation analogous to that which we have in practice, 
I performed another random sampling experiment, and obtained 20 values of s® 
by sampling from the known distribution of s? with a fixed value of c? = 1 and with 
the number of degrees of freedom equal to 25. The arithmetic mean of those s?’s 
was used as acommon estimate of o?. The same method would be applied in practice 
to the data of a series of 20 experiments of equal accuracy, each comparing five 
new varieties with a standard in six randomized blocks. Having applied the 


methods described to the results of the sampling experiments, the estimates of 


p(X) were obtained. which may be compared with the true distribution from 
which we started. 

In order to obtain the set of values of s?, it is necessary to apply the usual 
sampling technique with Tippett’s random numbers (9) and the distribution of s* 
es . 


— ou r ( if) of s 


p (s?) Seer as wi. ih rn Sees (21) 


The distribution, p(X), in Example 1 was assumed to be symmetrical about 
zero, While in Example 2 it was asymmetrical. The frequencies are given in 


Table I. 
TABLE I 
Hypothetical distribution of X 


X —§|—5| —4 3; -2 1;'0,1 2 5 141/516)17)8)9) 10) 11) 12 
Ist example l 3 5 9 12 13 14; 13) 12 91:5:3)1 
2nd example 2 6 | 10 | 14) 14/12) 10/18/6;);514/3/2)| 2] 1 l 


Table [I gives the values of frequency constants obtained from the observed 
values of x’s, using for p(X) the formulae (15). 
TABLE I] 


Frequency constants for p 5) and n(X) 
1 y l P\ 


x) yp (X) 
P \ } 
Example | Example 2 Example 1 Example 2 

my 0-1405 2-5850 Wi, 0-1405 2-5850 
Ms 7°8233 11-4415 Vv, 6-8773 10-3255 
Ms 04027 19-4065 Wl 0-4027 19-4065 
mM, 148-6389 367-983 1 M, 106-9186 295-1070 
3, (a) 00-0008 02564 ae 00-0005 00-3421 
3, (2) 2-419] 2-S1L1L0 X 2-2606 2-7680 
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The Pearson Curves fitting p(x) and p(X), found by the method of moments 
in the two cases, are as follows: 


Example 1: 


a \ 26644 
iia WOME Ys coe re ea 29 
P (*) . | aaa Sag 
: . X2.= \ 18573 
p(X) = 132480(1— sanes) ae ee aa (23) 


both curves are with origin at their common mean 0-1405. 
Example 2: 


4-6536 


p (x)= 11-3920(1 + mae ,) (2 Ss aa (24) 


351! ~ 168592 
‘ x 07231 X 3-0168 " 
p(X)=1212929(1+ 5) (1-Day) eee (25) 


The origins of the two curves are at their respective modes, 1-0486 and 0-6400. 

In Figs. 3 and 4 the histograms represent the true distributions of X, the 
dashed curves correspond to equations (22) and (24) the curves marked (i) to 
equations (23) and (25), while the curves marked (ii) represent the estimates of 
p(X) obtained by Levy’s method. It is seen that in both cases, both methods of 
estimating p(X) give satisfactory results. 

Of course, the sampling experiments cannot be considered as a definite 
evidence that a particular method of estimation is satisfactory, however favour- 
able may be the results. However, the two examples described above seem to be 
encouraging, and we may hope that the results obtained below by applying our 
method to the data of actual experiments give us reasonable approximations to 
the true distributions of X. 


(d) The case where o varies from experime nt lo expe rement 


In the above theory we have made an essential assumption that the standard 
error of the estimated excesses of the new varieties over the standard does not 
change from one experiment to another. This is a possible hypothesis in case where 
all the experiments considered are carried out on a single large field by the same 
experimenter with the same care. However, we must be clear, as far as possible, 
whether this hypothesis is justified or not. First we may test it by the usual 
L,-test (10). If this gives a favourable result then the application of the above 
method may be considered as more or less safe. But the L,-test may provide 
evidence that the standard error o does vary from experiment to experiment. 
This, however, is not necessarily sufficient to make the above methods of esti- 
mating p (X) totally invalid. In fact, the variation of o may exist, but it may be 


within a very small range, and in this case we may expect that the result of the 
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estimation of p (X), based on the assumption that o is throughout constant, will 
not be very inaccurate. 


Example 3. It would be difficult to study theoretically what inaccuracy in 
any particular case may arise in estimating p(X) by the method of moments 
described above, when o is not constant. In order, however, to throw some light 
on this point another sampling experiment, similar to those in Examples | and 2, 
was carried out. It was assumed that in each of the 20 hypothetical experiments 
in which X varied as in Example 2 the values of o were different. The distribution 
of o was assumed to be as given in the following table. 


TABLE III 
Hypothetical distribution of o 
o 0-65 | 0-70 | 0-75 | 0-80 | 0-85 | 0-90 | 0-95 | 1-00 | 1-05 | 1-10 | 1-15 | 1-20 | 1-25 | 1-30 | 1-35) 


Frequency | 1 1 


bo 
—_ 
_ 
— 
bo 
bo 
bo 

_ 

—_ 

— 
bo 
—_ 
= 


Mean o=1; s.D. of c=0-2 


This is a relatively wide spread so that the example should provide a fairly 
severe test of the adequacy of the method based on the assumption that o is 
constant. Out of this distribution, using Tippett’s random numbers, a random 
sample of 20 was drawn and the values of o’s obtained were associated with 
20 hypothetical experiments. To obtain the errors involved in the 2's, the original 
values of e’s, which were obtained in the previous sampling, were multiplied by the 
corresponding values of o. This would exactly correspond to random sampling 
from a normal population with the particular value of o. 

The variation in the true values of o from one experiment to another would 
also affect the estimate of the variance of x, the change being proportional to a. 
Accordingly, the values of s? obtained previously for each hypothetical experi- 
ment were multiplied by the appropriate o?. 

Having thus obtained a new set of empirical data of 20 hypothetical experi- 
ments with varying o, the previous method of moments was applied to estimate 
p(X), and the results are shown in Fig. 5. 

The histogram, as previously, represents the true distribution of X; the 
continuous curve represents its estimate obtained previously when o was constant 
from experiment to experiment; and lastly, the dashed curve represents the 
estimated p (X) obtained by the same method from data affected by the variation - 
of o. It is seen that the two curves differ, but not very seriously. This may be 
considered as an indication that when the variation of o from experiment to 
experiment is only moderate, our method may still be used to provide a reasonably 
accurate estimate of p (X). This fact is important, because even if the L,-test fails 
to detect the variation in o this may still exist. 
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IV. APPLICATION TO THE ACTUAL EXPERIMENTAL DATA 
In the following, I apply the method described to experimental data with 
sugar beet which were kindly supplied by Messrs K. Buszezyriski and Sons, Ltd., 
Warsaw, and it is a pleasure to express here my gratitude to the Directors of 


this firm. The data used refer to the experiments carried out in 1923 and 1924, in 
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Fig. 5. Effect of variability of o on efficiency of method of estimating p (X). 
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Estimated distribution of X’s when o's are equal. 
Estimated distribution of X’s when o’s are different and the coefficient of variation 
of o is 20% of mean c. Histogram is the true distribution of X’s. 


one of the firm’s experimental stations, Gérka Narodowa. The total number of 
experiments carried out each year was about 100, each comparing with the 
standard three new varieties selected and bred by the firm. All these experiments 
were carried out on a very large and uniform field by the same staff and using the 
same methods. This circumstance makes it probable that the assumption of the 
standard error in each experiment being constant, or at least not very variable, 
is not far from being true. The number of replications was not the same in all 
experiments. In order to get the material into a form convenient for numerical 


work, i.e. to have the same number of replications in each experiment, out of each 
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year’s data 40 experiments were selected, each with 5 replications. About the 
layout of these experiments I have the following information: the experimental 
plots were comparatively narrow and long, and cut across the direction of 
ploughing so as to make them as homogeneous as possible. The number of roots 
in each plot was 100. Of course, during the vegetative period some of them 
perished. The distribution of the varieties in each particular experiment was 
systematic, as shown in Fig. 4. 


Fig. 6. Arrangement of experiments. 


The systematic arrangement of the experiments did not permit the use of 
the customary methods of working out the data, as those assume randomization. 
The method used was that proposed by Neyman(10), consisting in estimating the 
fertility level for each plot and each variety. The basic assumptions being that: 
(i) a fourth order parabola is able to represent the level with sufficient accuracy, 
and (ii) that the levels corresponding to two different varieties are parallel. 

[t will be noticed that with the systematic arrangement as shown in Fig. 6, the 
comparison between varieties V;, and V,, V;, and V, must be more accurate than 
that of V;, and V,. In all cases we have the same number of replications and in the 
former case the difference in soil in adjoining plots sown with the compared 
varieties must be, on the whole, smaller than in the other. This intuitive inference 
is numerically expressed in Neyman’s formulae and in the final results, but the 
estimated variances of the excesses of V;, over V,, and of V;. over V, appeared to 
be very close, e.g. 


8}, = 8}, = 0-0117, s?, =0-0118._ For this reason these differences 
were ignored. 

Tables IV and V show the values of the z’s and s°’s calculated for each of the 
varieties compared in years 1923 and 1924. The Z,-test applied t- .ne 40 s*’s did 
not discover any significant variation in their size, in 1923. It was found in fact 
that L, = 0-910 whereas L, (0-05) = 0-906 for f= 13, N =40. On the other hand the 
variation in s* in 1924 proved to be significant, ZL, = 0-854. It follows that while 
the data for 1923 gave us no reason to doubt the validity of the assumption that o 
is constant, it is possible that the variation in o in 1924 will influence unfavourably 
the accuracy of the estimate of p(X). 

Here, however, we may remember the encouraging results of the sampling 
experiment discussed above as Example 3, which shows that the method of 
moments is not very sensitive to moderate variation in o. Of course, it wouid be 
desirable to carry out this experiment assuming that the distribution of o is 
approximately what it actually was in the experiments considered. For that 


purpose an attempt was made to estimate this distribution on the lines similar to 





- —— 





[V 


. 
4 


TABLE 


1H99 


988P-0 
090F-0 
686-0 


OF 


FIZ0-0 


ZOPE-O 
CEE9-0 
[EhZ-0 


O€ 


6600-0 


[¢6¢-0 
CR89E-0 
LFLE-O 


OZ 


EEL0-0 


90EL-0 
ZOZE'0 
ZPS9-0 


OSTO-0 





6PLZ-0 
2601-0 
ZE80-0 — 


6E 
9800-0 


L8Z1-0- 
[I8gl-0- 
£600°0 — 


62 





[220-0 — 
£990-0 — 
680-0 


61 


OLZ0-0 


S9EP-O 
OF9Z-O 
€66F-0 


9 


EPLO-0 


LLIE-0 
1g8Z-0 
169€-0 





PE10-0 





8% 


P9EO-O 





C610-0- 
SEEL-O 


81 





96EP-0 
9662-0 
806P-0 














cclo-0 EPZO-0 
PLOL-O— LO80-0 
oLhS-0 — L962-0 
LIST-0- COPE 

Le OE 
6980-0 











8L00°0 C800-0 
OLFO-0 90ZT-0 
L06€-0 OLee-0 








OF Ut paurnjqo 28 pup (Jaaq Ut JUuajUod wwhns [o *yuao sad sp) x 


zZ 


8400-0 


SPSE-O 
LOT¥-O 
LI99-0 











PZTO-O 


P9TO-O 


8896-0 
OL6E-0 
ECOP-O 


SCLO-O 


960I- 
PED 











POLO-O 


8080-0 
Z880-0 
COP () 


L800-0 














PROO-O 





6610-0 





AI WITdaViL 











Z0Z0'0 


£662-0 
R61P-O0 


286-0 


Gn 





10-0 


[IPvlh-O 
ZRRE-O 
L10Z 





Ig 


PLO 











9LLP-O 
996¢°0 


LOEP-O 

















fo sanjD 4 





es “xo jo 





"ON 


‘dxa jo “ON 
































































80TO-O Z2LZ20-0 [L00-0 8SOE0-0 
8861-0 SPrO0-0 O8ZE-0 - 6F9F-0—- OSTL-O E8I9-0- PCEP 9869-0 0€6¢9-0 — ®x 
S8LZ-0 Z8lE-0 LETE-0- 69LT-0- LOLZ-0 [Z00°0 SOE S81Z-0 9ZES-0- “a 
OLTE-O STEPr-0 6992-0 — 699E-0— PZOZ-O CE6BL-0- LI€€-0 SCOP-0- tx 
OF 68 88 LE 9g cg E €£ Ze ‘dxa Jo ‘on 
FETO-0 9620-0 O9TLO-0 RCT0-0 SELO-O £ECO-0 2° 
FLE0-0 — 9PEL‘0— £096-0 — 0980-0 — PEST F690-0— 6F8E-0 Mod 
| 8é61-0—- | gé6g9-0— IStl-0- LLGO-O 869§-0- 6861-0 C66E-0 “x 
[ZOL-0— LE8E-0— LOT¥-O $990-0 - LE89-0- [C8e-0 [€EE-0 le 
| O0€ 6z LG 9G GS 2 &Z Za 1z "+ dxo Jo ‘ON 
CFILO-0 6LZ0-0 L9LO-0 Z8Z0'0 LI€0-0 L0ZO-0 SOE ZLLO-0 28 
GLLE-0 9Z1T-O OO8F-0 — CPLI-O L890-0- SOLL‘O C1ZE-0 80E0-0 
9Z9E-0 6€£0-0— LO9T-0- 8ZL0-0- CH6Z-0 9ECT-0 68Z1-0 [Z¢0-0- 
} 6861-0 6021-0 CI8T-0- 36660 — SSLE-O 6¢ 9FIT-O- SPFZO-O € tz 
0G 61 81 LI 91 CT FI €l él Il **+ *dxo Jo ‘on 
CPEO-0 8200-0 L0ZO-0 GEE0-0 £0Z OLGO-0 SETO-0 69F0-0 28 
0008-0 91-0 O8EE-0 — 8210-1 — 9001-T — 886-0 — €81Z-0- CEES-0 E19F-O "2 
ZY0L-O LL6OL‘0— G8SE-0 POST-O SOEE-0— COPE-O L6UIT-O [8€0-I - L16§ “a 
LEET-O [161-0 [0-0 8999-0 — 8999-0 - OZEZ-0— 60Z1-0- 6Z81-0 6900-0 SLE 'g 
Ol 6 8 L 9 ¢ t € Z i *** -dxa Jo ‘on 


PZ6L ‘spuauisadaxa YF Ut PaUutnjgo 28 pun (Jaaq U2 quazuoo wwhns fo "yuao sad sp) x fo San]D 4 


A WTavib 








Lurie 


VW'VUaela 


vo 


Vewiiesr Puiu WVU & VWVeVo VWwuyvil VWVUlod 


VWVivde 

















Y. Tane 47 


those followed to estimate p(X). However, a few sampling experiments carried 
out to test the method, showed that its efficiency is very poor. Consequently it 
was abandoned. On the other hand, the estimate 2 of the variance of the o’s 
based on that of the observed s,’s, which the reader will have no difficulty in 
calculating, namely 


where s§ means, as formerly, the arithmetic mean of the observed variances s? 
and § that of their square roots s;, proved to be fairly accurate. This formula was 
applied to the experimental data of 1924 and it was found that = amounted to 
about 20 %, of §. This empirical result was used to fix the variation of o in the 
sampling experiment of Example 3, so as to have its $.D. also equal to 20 % of the 
mean. The results obtained there suggest that applying the method of moments 
to estimate p (X) for the experimental data of 1924, we should not be very wrong. 

The usual frequency constants calculated for the two years for the distribu- 
tions of x and X are as follows: 


1923 1924 
So? = 0-0160 ‘9° = 0-0259 
m. (x) =0-0832 ms (x) =0-1358 
B, (x) =0- 3490 B, (x) =0-1683 
By (x) = 3-9609 By (x) = 2-8962 
8, (X)=0-6617 B, (X)=0-3179 
8, (X)=4-4718 3, (X)= 2-8412 


Here m, (x) denotes the variance of x. It will be noticed that the ratio m,/s? is in 
the two cases just over 5, which is of about the same order as in the sampling 
experiments carried out to test the efficiency of estimating p (X).* 

The values of the f’s for both p(x) and p(X) suggested type V and type I 
Pearson Curves in the years 1923 and 1924, respectively. These lead to the 
following equations for p(x) and p(X) obtained using the method of moments: 


, 97-4138 
1923: p (a) = 472334 (10)%(—2)-8Me 2 (27) 
origin at x= 2:2970; 
35-9874 
p(X) = 427380(10)""( — ZX) re ZF assess (28) 


origin at X = !-6277; 


* For the sampling experiments the values of m,(x) are given in Table II, p. 40 above, while c* 


was unity. 
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v \ 91518 x \ 33989 
1924: p(x) =1-064(14 - eeee.(29 
dats dic ( : sicia) () 08569 , (29) 
origin at x= 0-0260; 
: X 4-0770 a 1-1124 
Xx =1:1738(1 ) (1- ) ae 30 
P(A) 138\ 1+ 16359 *~ 09-4464 7 


origin at X = 0-0918. 


The curves are represented in Figs. 7 and 8, where the histograms refer to the 
observed values of z’s, the continuous curves represent the estimated p(X) and 
the dashed curves represent p (2). 

It is seen that in the two years the curves differ both in shape and relative 
position with respect to the origin of co-ordinates. This may be due partly to the 
change in atmospheric conditions and partly to the fact that the standard variety 
used was not the same in the two years. In fact, the variety which was used as a 
standard was the one which, in the previous year’s competitive experiments 
carried out by a special commission appointed by the sugar industry in Poland 
proved to be the sweetest. This change in the standard varieties is probably 
justified in the special conditions of sugar beet breeding. However, in other cases 
as, for instance, in breeding of barleys for brewing, the standard variety would be 
probably more stable. 

Having this in view, we shall have to consider two possible ways of pro- 
ceeding: one corresponds to the assumption that the standard variety remains 
unchanged from year to year, and the other to the case where the standard 
variety is changed. Because of lack of experimental data corresponding to the 
first situation, we shall explain the procedure, using the material concerning the 
sugar beet described above and ignoring the circumstance that the standard 
variety was in fact not the same in the two years. Thus, the shift in the curve will 
be ascribed solely to the changes in atmospheric conditions. In order to illustrate 
the second situation we shall use the same data, taking into account the fact that 
the standard variety was different in the two years. 

We shall now start by considering the first situation. Let us agree to call “‘ good” 
varieties, in each year, those which proved to be sweeter than the standard. The 
percentages of these could be found by calculating the areas under the curves, 
p(X), extending to the right-hand side of the origin of coordinates. The calcu- 
lations showed that, in the year 1923, there were about 87-5 °, of good varieties 
and, in the year 1924, about 46-5 %,. 50 °% of the sweetest out of the good varieties 
may be called the “best”’ varieties. In the year 1923, the best varieties will be 
those exceeding the standard by 0-37 °%, of sugar content, and in 1924 this lin-*+ 
will be 0-23 %. Now, we may calculate the probability of detecting a good or a 
best variety in an experiment with some particular number of replications, if the 
accuracy of those experiments were equal to that of the actual experiments. 
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Distribution of p (x). 
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Applying the method described above (section IT, p. 36), Figs. 9 and 10 were 
constructed. The outer thick curve represents the part of the distribution p (X) 
taken from Figs. 7 and 8 extending to the right of the origin of coordinates. The 
area under this curve is, in each case, equal to unity, which means that we 
limit our consideration to the good varieties only. The ordinates of all other 
curves were obtained by multiplying p(X) by the corresponding values of the 
power function B(x) obtained from the Neyman tables. Areas under the 
continuous and the dashed curves represent the probabilities of detecting a 


good variety with the sugar excess falling within any given limits, if the number of 
replications were n= 5, 10, 15 and 20. It was assumed here that the accuracy of 


those hypothetical experiments, that is to say the standard error per plot, is 
equal to that of the actual ones, but that the layout of the experiments is 
different, namely, they were assumed to be arranged in randomized blocks. The 
difference between the continuous and the dashed curves is that the former 


correspond to the case where the assumed level of significance (the probability of 


first kind errors) is «=0-05 and the latter when it is «=0-01. It is seen, as could 
be expected, that, in the latter case, the detecting power of the experiments is 
considerably smaller. Tables VI and VII give the probabilities of detecting any 


of the good and any of the best varieties in accordance with the number of 


replications and with the level of the significance used. These probabilities are 
areas under the continuous and the dashed curves in Figs. 9 and 10. For the best 
varieties these areas had to be doubled. 

Figs. 9 and 10 and Tables VI and VII may be used to draw conclusions as to 
the number of replications to be used in the following years. Our attention must 
be directed primarily towards the best varieties. Looking at the tables we see 
that the conditions in the two successive years differ enormously: while in 1923, 
five replications make the chance of detecting a best variety, at «=0-05, equal 
to 0-908, the same chance in 1924 was only 0-542. This is due to the change in the 


standard error per plot connected with weather conditions, and also to the shift of 


the curve with respect to the origin. In 1923 the standard error per plot was 0-199 
and in 1924 it increased to 0-254. Rational planning of future experiments requires 
obviously the knowledge of changes in accuracy of the experiment occurring 
from year to year. Two years’ observations indicate only that the variation may 
be very great. According to the prevailing possibilities of using space, additional 
labour, etc., when planning experiments for the third year we may take into 


account the possibilities of weather conditions giving as low an accuracy of 


experiments as in 1924. Then it might be thought advisable to use as many as 
10 or 15 replications. If, however, such a scale of experiments is for various 
reasons prohibitive, it may be necessary to use a smaller number of replications. 
Looking at Table VII, we see that if n = 5, and if the accuracy of the experiments 
is as bad as in 1924, then it would not be wise to apply the leve: of significance 
«= 0-05, let alone «=0-01. 
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Fig. 10. Probabilities of detecting good varieties in 


N.B. The four curves of each type, starting from the highest, relate to cases n=20, 15, 10 and 5 respectively 
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The procedure to be advised in this case seems to be as follows. If the economic 
conditions have forced the breeder in some future year to use only 5 replications, 
the decision as to what varieties should be considered as failing to exceed the 
standard, should be based on the analysis of the whole lot of the experiments as 
given in the present paper. If the calculations lead to figures as in Table VII, for 
1923, then it would mean that the accuracy of the experiments was satisfactory 
and probably there would be no objection to the use of the level of significance 


TABLE VI 


Chance of detecting a “‘good”’ variety 


1923 1924 
n 
x=0-05 a=0-01 | «=0-05 x=0-01 
5 0-649 0-483 0-343 0-166 
10 0-756 0-665 0-485 0-319 
15 0-815 0-744 0-558 0-410 
20 0-841 0-781 0-609 0-478 


TABLE VII 


Chance of detecting a “‘best”’ variety 


1923 1924 
n ! 
a=0-05 x=0-O1 a=0-05 «=0-01 
5 0-908 0-756 0-542 0-288 
10 0-974 0-953 0-767 0-559 
15 0-996 0-991 0-865 0-711 
20 0-999 0-994 0-922 0-814 


a= 0-05 or even 0-01. In fact, the application of « = 0-05 will detect over 90 ° of all 
best varieties and nearly 65°, of all good varieties. The remainder may be 
neglected. If, however, the calculations lead to a picture similar to what we found 
for 1924, then it would be advisable to apply special precaution in order not to 
discard the varieties the value of which may be considerable. The best thing to do 
would be to classify the varieties tested into the following groups: (i) those for 
which the advantage over the standard was proved beyond any reasonable 
doubt, even under the prevailing unfavourable conditions; (ii) those for which 
the value of x is not significant at «=0-01 but is so at some greater values of «, 
perhaps at «=0-1 or more, this value being so chosen that the probability of 
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detecting a “best”’ variety is considerable, say 0-9 or more;* (iii) the third group 
will consist of the remaining varieties which it will be more or less safe to discard. 
Obviously, it is difficult to give any general rule discriminating between what is 
to be considered as a large and a small chance of detecting a best variety. This 
must be left to persons responsible for the whole experimental work and the process 
of breeding. The problem of the statistician is accomplished when he finds means 
of calculating this chance. Of course, if the number of replications is very con- 
siderable, then all these calculations may not be necessary. But this probably 
will be only rarely the case. 
TABLE VIII 


Chance of detecting a “‘good”’ variety 


1923 1924 
nt 
a=0-05 a=0-01 a=0-05 a=0-01 
5 0-903 0-729 0-319 0-155 | 
10 0-969 0-941 0-452 0-297 | 
15 0-992 0-980 0-519 0-382 
20 0-996 0-991 0-567 0-445 
TABLE IX 
Chance of detecting a “best” variety 
1923 1924 
7 
a=0-05 x=0-01 a=0-05 a=0-01 
5 0-976 0-891 0-518 0-269 
10 0-994 0-979 0-739 0-526 
15 0-999 0-997 0-842 0-675 
20 1-000 0-999 0-904 0-780 


Finally, we must consider the situation which presents itse!{ when the standard 
variety is changing from year to year. In this case, the experimenter will have to 
consider two points: first, if the distribution p(X) is situated almost entirely to 
the left of the origin of coordinates, this may indicate that his method of breeding 

* T may remark that to carry out these calculations Neyman’s tables of probabilities of second 


kind errors should be extended so as to apply to other levels of significance beyond the x=0-05 
and «=0-01. 
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and selecting is not satisfactory. The error may lie in the choice of the parent 
plants used for crosses. Again, there may be something wrong in his principle 
of selection of single plants generating new families. This point lies beyond the 
limit of the present paper. Secondly, the experimenter will be interested in the 
possibility of making a proper choice out of the existing material. He will 
probably use another definition of the best and good varieties.* For example, he 
may define the best varieties to be those the sugar content of which exceeds that 
of 75 % of the whole material. Again, the good varieties may be defined as those 
which exceed in sugar content, say, 50 °% of the whole lot. It is very easy to 
calculate the tables analogous to Tables VI and VII corresponding to these new 
definitions. The results are given in Tables VIII and IX. The discussion is quite 
similar to that given above. 


V. SUMMARY OF RESULTS 


The whole process of plant breeding may be roughly divided into two parts: 
(i) the production of new families or varieties which may prove to be better than 
the established standards, and (ii) the test whether any of these new varieties do 
exceed in quality the established standards. The second of these steps is connected 
with field trials in which the new varieties are compared with the variety taken 
as a standard. 

The quality of any variety is a very complex conception and depends on a 
large number of different characters. However, there is usually some single 
character of the plants, the importance of which is greater than that of any others, 
and which by itself is being taken as a conventional measure of the quality. This 
may be the average yield, the sugar content, percentage of nitrogen, etc. The 
difference between the average value of such a character in a new variety and in 
the standard is called an excess over the standard, which may be either positive 
or negative. The field trials are not able to give the true values of the excesses but 
only their estimates which are necessarily affected by experimental errors. 
Through these experimental errors it is possible that the new varieties with 
positive and perhaps even relatively considerable excess will not be detected, 
which may lead to their ultimate rejection. It is obvious that such a circumstance 
is unsatisfactory as it involves considerable waste of effort connected with a 
successful breeding of a new variety. The question therefore arises as to what 
number of replications in field trials should be used in order to have a fair chance 
of detecting new varieties with positive and sufficiently large excesses over the 
standard. The solution of this problem requires the knowledge, or at least an 
approximate knowledge, of the distribution of the true excesses (not of their 
estimates) over the standard, likely to be found in new varieties which may 
present themselves for comparison with the standard. Of course, this distribution 


* See p. 48 above. 
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is connected with the method of breeding. A method of obtaining an estimate of 
the distribution of the true excesses, based on the examination of the results of 
similar trials in previous years, is the main topic of the present paper. The method 
devised was tested on a few artificially constructed sampling experiments, then 
compared with an alternative method advanced by Eddington and Levy, and 
found to be satisfactory. It was then applied to actual experimental data 
concerning 120 new varieties of sugar beets bred for sugar content by Messrs 
K. Buszczyriski and Sons, Ltd., Warsaw, and tested in the same conditions on 
two adjoining fields in the years 1923 and 1924. Having obtained the estimate of 
the frequency distribution of the true excesses in each year, it was then possible 
to judge the efficiency of future experiments with n = 5, 10, 15 and 20 replications, 
in detecting the “good” and “best” varieties, if the accuracy of the experiments 
were similar to those in 1923 and 1924. 

Some general conclusions as to the number of replications to be used and as 
to the method of procedure if the accuracy of the fields proves to be poor have 
been drawn. The method of estimating the distribution of true excesses may be 
useful also when two different methods of selecting new varieties are com- 
pared. And here we come to the original question formulated at the beginning 
of the paper: which course is better, to start say 200 new varieties each year and 
then test them with 5 replications only, or to diminish the number of new 
varieties to some 100 and test them with 10 replications? If the records of sugar 
content of parent plants of 200 varieties already in field trials are available, then 
the breeder is able to see what would have been his results if he had started only 
100 of them. Using all the 200 varieties, he would be in the position to estimate, 
SAY Poo (X), the distribution of true excesses among these 200 new varieties and 
also, say 200 x P (200, 5), the number of best varieties which he may reasonably 
expect to detect in trials with 5 replications. Again, he may use the records of the 
sugar content (and probably of other properties) of the parent plants to see what 
would be the results of his individual selection if he had decided to start only 100 
new varieties. Picking out of the records of the field trials the data concerning 
the 100 varieties which would have been selected in such a case, he would be in 
the position to estimate, say P99 (X), the distribution of the true sugar content 
among these varieties. This distribution would lead him to, say 100 x P (100, 10), 
the expected number of best varieties which would be detected in the field trials 
with 10 replications. The comparison between 100 x P (100, 10) and 200 x P (200, 5) 
would provide the answer to the question formulated above. 


In conclusion, I wish to express my hearty thanks to Dr J. Neyman for sug- 
gesting this problem to me and for his constant help, both during the course of 
research and whilst writing the paper. 
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RURAL MORTALITY. ITS COMPARATIVE 
SEX INCIDENCE 
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London School of Hygiene and Tropical Medicine 


THE mortality of England and Wales has exhibited certain characteristics, of 
which the two most prominent have been the subject of various investigations. 
The first of these is geographical; proceeding from the north southwards a decrease 
in mortality is observed. The second is the excess of urban mortality over rural. 
In addition, attention has been drawn to a further feature which would also 
appear to be of a permanent nature, i.e. the ratio of the death-rate in rural 
districts to that of the general population is proportionately lower for males than 
for females. The purpose of this short paper is to attempt as far as is possible 
from the available data an inquiry into this phenomenon. 

The data used were the recorded deaths for males and females during the 
triennial periods 1920—2 and 1930-2 in the aggregated rural districts of the whole 
country and its geographical divisions. The ratio of actual to expected deaths 
in each area was calculated. Utilizing the records for 1920—2 as an example, the 
expected deaths were obtained as follows. The average male death-rates during 
that period for England and Wales, in single years for the first five years of life, 
in quinquennial groups from ages 5 to 85 and of one group of age 85 and over were 
applied to the rural male population, at corresponding ages, recorded at the 1921 
census for each of the areas indicated in Table I. The sums of the resultant series 
gave the numbers of expected male deaths for each area. A similar procedure 
was adopted for the females. The actual deaths were taken as the average of the 
triennia and the index tabulated was that of the actual deaths divided by the 
expected. The results are given in Table I. 

Table I shows that the males enjoyed a relatively more favourable mortality 
than the females for both the triennia 1920-2 and 1930-2. For all the areas 
combined the male deaths were 19 and 15 % below the number expected on the 
basis of the whole country for 1920-2 and 1930-2 respectively, while the rural 
female deaths were 12 and 8 % less. In each area the ratio was larger for females 
than for males in each triennial period. The differences between the male and 
female ratios were small in the South-east, South-west and Midlands I divisions. 
The largest differences existed in the two Welsh and in the Northern I division, 
where in both triennia the actual female deaths were in excess of the expected. 
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A large difference was also noted in 1930-2 for the Northern III division, where 
once again the actual female deaths were greater than the expected. 

In view of the disparity in the ratios of the two sexes at all ages, the analysis 
was next made for specific age periods. The results are shown in Table IT. From 
this table it appeared that the relatively more favourable male mortality was not 
confined to any particular age period, but was evident in every age group above 
age 15 in every division of the country. Under age 15 there was little or no 
difference between the male and female ratios for the country as a whole, but 
within some divisions some variation occurred. 


TABLE I 
Deaths from All Causes in Rural Districts 


Actual deaths/Expected deaths 


1920-2 1930-2 
Area 
Maies Females Males Females 
South-east 0-72 0-76 0-78 0-81 
North I 1-02 1-17 1-00 1-18 
North II 0-81 0-88 0-84 0-96 
North III 0-98 1-04 0-96 1-09 
North IV 0-86 0-94 0-88 0-96 
Midlands I 0-83 0-85 0-86 0-9] 
Midlands II 0-81 0-90 0-86 0-94 
East 0-73 0-82 0-77 0-87 
South-west 0-77 0-83 0-84 0-88 
Wales I 0-94 1-06 1-02 1-i3 
Wales II 0-93 1-05 0-97 1-09 
England and Wales 0-81 0-88 0-85 0-92 


To determine if any particular cause of death was responsible for the observed 
differences the deaths of the whole country were divided into seven broad cate- 
gories. Expected deaths were calculated as before, using the death-rates at ages 
from each category for England and Wales as a whole. The ratios of actual to 
expected deaths were tabulated and shown in Table III for all ages and for four 


age groups over 15. For all ages, every cause of death with the exception of 


violence showed a relatively greater decrease among males than among females. 
The greatest differences between male and female ratios were those for pul- 
monary tuberculosis and cancer, where the males showed a relatively greater 
improvement than did the females. 

The excess of the female ratios over the males from these two causes of death 
was common to every division of the country, as can be observed from Table IV. 
Generally a large difference between the male and female ratios of actual to 





— 








2oS 
os 


r6-0 
60:1 
10-1 
26-0 
16-0 
L6-0 
£6-0 
80°1 
FO°1 
66-0 
SI 
L8-0 


| 
£6-0 
| 66-0 
ZO-l 
[6-0 
88-0 
96-0 
L6-0 
00-1 
| OO-T 
26-0 
£01 
88-0 


£6-0 
F0°1 
| 96-0 
06-0 
88-0 
C6-0 
96-0 
ZO'l 
66-0 
96-0 
£0°1 
88-0 


IOAO puR GL 


16-0 
Tia | 
91° 
L8-0 
Z8-0 
66-0 
88-0 
GOT 
0-1 
00-1 
CI-l 
08-0 


88-0 
F0°1 
CO: 
C8-O 
8L-0 
26-0 
L8-0 
80-1 
£0°1 
06-0 
ra | 
LL‘O 


¥8-0 
96-0 
ZO 
€8-0 
9L-0 
98-0 
C8-0 
16-0 
16-0 
€8-0 
¥6°0 
8L-0 


28-0 
96°0 
€6°0 
08-0 
[L-0 
[8-0 
88-0 
26°0 
L6°0 
€8-0 
96-0 
€L‘0 


paipy pun aby 0. burpsovon paifissnyo 


16-0 
90°T 
OI-T 
L8-0 
8-0 
£6-0 
88-0 


#8-0 


00:1 
OL 
08-0 
GL-O 
88-0 
6L-0 
26°0 
86-0 
06-0 
Pil 
€L-0 








[16-0 | 9L-0 | £6-0 | Z8- 
80-1 | 88-0 | 02-T | 66- 
LI-I | ZO-1 | SI-T | 66° 
68-0 | LL-0 | 26-0 | I[8- 
16-0 | $9-0 | 16-0 | IL: 
16-0 | SL-0 | 86-0 | Z8 
G8-0 | 8L-0 | 66-0 | €8 
16:0 | 8L-0 | #6°0 | GL 
16-0 | [8-0 L8: 
88-0 | €L-0 | $6-0 | OL 
F0-T | [8-0 | @ 6 
#80 | 69-0 | LL-O | 6L 

Z-0€61 
98-0 | ZL:0 16:0 | OL 
LO-L | 98-0 | 6I-L | T8- 
90°T | 68-0 | SI-L | 68> 
G8-0 | ZL-0 | 98-0 | 9L 
28-0 | [9-0 | G8-0 | 99- 
06-0 | 69-0 | #6°0 | &L- 
08-0 | 9L-0 | 06-0 | LL: 
€6°0 | 8L-0 | 88-0 | ¢8 





. CR: 
€8-0 | L9-0 | 98-0 | 89 
cO-T | € , gs: 
€L-0 | 99 8L£:0 | TL 





Ct 


v¢ 





0 
0 


yyeep poy 


86-0 | 06-0 | 96: 
6€-1I | IT-I | OF- 
GZ-I | LI-L | 18: 
00-1 | 06-0 | €6- 
ZO-L | 82-0 | 96 
68-0 | €6°0 | #6 
96-0 | 6-0 | L6 
#6-0 | 6L-0 | 99 
cO-T | 88-0 | II 
c8-0 96-0 | Z6 
CI-I | LI-T | 0€ 
€8-0 8-0 | OL 











0 


I 
I 
0 


“O 





dx /syyeop [engoy 





a 0) 


28-0 
C6-0 
00-T 
ZL‘0 
ZL) 


88-0 
6L°0 
8-0 





SOUS] [DANG Ut sa8ND) 1V mod, SYywaT] 


ATAVL 





soJe@M pusw puvpsuy 


Ii S°r@M 

[ 891% 
}SOM-YINOG 
1sBVy 

IT Spuelpry 
I Spue[pry 
AI UHON 
[1 4410N 
Il UW4ON 

I 440 N 
}SBe-YINog 


2M, pue puvlsuy 


IT S°7@MA 

I So7e@ 
}SoM-YINOG 
ISB] 

IT Spue[pry 
[ spuelpry 
AI UON 
IIT U4ON 
IT UVION 

I UWON 
ISB. “Yynog 





Bolly 


eed vl 


dnoas 








60 Rural Mortality 


expected deaths from pulmonary tuberculosis was associated with a large differ- 
ence between the ratios from canter. Not only was this association evident 
within the divisions but it was also exhibited in each triennial period, that is to 
say the divisions where the female ratio showed a large excess over the male in 
1920-2 also generally showed a large difference in 1930-2. 


TABLE It 
Deaths in Rural Districts classified according to Age and Cause of Death 


Actual deaths/Expected deaths 


1920-2 
All ages 15— 25- 45 
M F M F M F M F M F 


Influenza 0-88 | 0-90 0-88 | 1-08 0-80 | 0-92 0-79 | 0-82 0-98 | 0-93 
Pulmonary tuberculosis 0-69 0-91 | 0-80 | 0-96 | 0-76 | 0-94 0-57 0-85 | 0-59 | 0-90 


Other respiratory diseases | 0-64 | 0-67 | 0-67 | 0-87 | 0-56 | 0-75 | 0-54 | 0-57 | 0-69 | 0-70 


Cancer 0-84 | 0-93 | 0-84 | 1-03 0-83 0-91 | 0-77 | 0-88. 0-91 | 0-96 
Circulatory 0-87 | 0-92 | 0-67 | 0-73 0-81 | 0-75 | 0-79 | 0-88 | 0-92 | 0-96 
Violence 0-96 | 0-85 | 1-06 1-10) 1-04 | 0-92 | 0-99 | 0-84 | 0-82 | 0-75 
Other causes 0-86 0-92 | 0-86 1-12 | 0-86 1-05 | 0-80 | 0-90 | 0-92 | 0-98 
1930-2 
All ages 15— 25 45 65 


M Fr M F M F M F M F 


Influenza 1-06 | 1-10 | 0-98 | 1-01 | 1-04 1-12 | 0-97 | 1-03 | 1-11 | 1-13 
Pulmonary tuberculosis 0-65 | 0-88 | 0-68 | O85 | 0-73 | 0-94 | 0-56 | 0-85 | 0-63 | 0-89 
Other respiratory diseases | 0-73 | 0-77 | 0-71 | O82. 0-70 | 0-85 | 0-64 | 0-70 | 0-79 | 0-80 
Cancer 0-86 | 0-95 | 0-80 | 1:04 0-82 | 0-88 | 0-81 | 0-93 | 0-91 | 0-97 
Circulatory 0-83 | 0-89 | 0-70 | 0-70 | 0-73 | 0-78 | 0-75 | 0-88 0-86 | 0-91 
Violence — 1-05 | 0-88 | 1-31 | 1-33) 1-19 | 0-98 | 1-05 | 0-85 | 0-78 | 0-73 
Other causes 0-90 | 0-97 | 0-95 | 1-11 | 0-90 1-08 | 0-85 | 0-96 0-94! 1-02 


It has long been known that occupation affects mortality, and the opinion 
has also been expressed that migration is an additional factor contributing to the 
high comparative mortality in young adult life in the rural areas. It has been 
observed that feniales migrate from the rural districts at an earlier age than males. 
If, as has often been suggested, only the healthier persons move to the towns 
then the excessive female migration would have an adverse affect on the rural 
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female mortality, This factor has been offered as an explanation of the relatively 
high rural female mortality in the age group 15-25. 

Thus two further features emerge which might be put forward as contributing 
to the phenomenon under consideration. An examination of this aspect must of 
necessity be limited owing to the lack of suitable data. An attempt was made, 
however, to divide the rural areas of the country according to these two headings, 
(1) occupation, and (2) migration. No unit smaller than a county could be taken. 
The counties were distributed in relation to the occupations followed by the male 
inhabitants of their rural areas and were ultimately classified into three broad 
groups, (1) those with less than 33 %, (2) those with 33 to 50 ° and, (3) those with 
more than 50 %, of the rural males engaged in agriculture. 

Turning next to the problem of migration, the only measure readily accessible 
was a ratio of female to male inhabitants of the rural areas for each county. For 
England and Wales as a whole the sex ratio was, in 1921, 109-5 females per 100 
males and in 1931 the ratio was 108-8. To indicate the extent of female migration, 
the counties have been divided into three, categories of ascending sex ratio, 
(1) less than 100 %, (2) 100-104 % and (3) more than 104 %. 

Table V was drawn up to show the ratio of actual to expected deaths in accord- 
ance with these two groupings. For each of the occupational groups the female 
ratio was in excess of the male for both triennia. The non-agricultural group had 
the largest ratio and differed significantly from the other two groups, whilst there 
was practically no difference between the ratio of the second and third occupa- 
tional groups. The female ratios exceeded those of the males by an almost con- 
stant amount throughout and did not indicate that occupation was a cause of 
the discrepancy between the male and female ratio of actual to expected deaths. 

The group with the lowest sex ratio had the highest ratio of actual to expected 
deaths. This group differed significantly from each of the remaining two, between 
which there was no difference. The excess of the female ratio over the male steadily 
declined with increasing sex ratio. The difference between the first and third 
group, in both triennia, was statistically significant. 

The standard error of the ratios, 7, given in this table was taken to be 
o, = /D/E, where D = actual deaths and EH = expected deaths. This result was 
arrived at as follows: 

Notation: P’= population in the age groups considered, d’ = death-rate per 
unit in this group; P and d are the corresponding quantities in the same age group 
in England and Wales; » denotes summation over all age groups and V(u) = 
sampling variance of any quantity, w. 

If the actual deaths in any group only differ through chance from the England 
and Wales value, then 

D=Z(P'd), V(D) = 2{P"* V(a’)}, 
d(i—d) 


where V(d’) he ree (i) 





_ 








W. J. MARTIN AND E. A. CHEESEMAN 63 


TABLE V 


Deaths from All Causes in Rural Districts classified according 
to Occupation and Sex Ratio 


Actual deaths/Expected deaths 





Percentage of occupied males 
engaged in agriculture 


Males 


1920-2 


Females 


Differences 


(1) Less than 33 % 0-89 + 0-0063 0-97 + 0-0069 0-08 + 0-0093 
(2) 33 to 50% 0-75 + 0-0056 0-81 + 0-0061 0-06 + 0-0083 
(3) Over 50 % 0-76 + 0-0083 0-84 + 0-0092 0-08 + 0-0124 
Differences between (1) and (2) 0-14 + 0-0084 0-16 + 0-0092 

(1) and (3) 0-13 + 0-0104 0-13 + 0-0115 - 

(2) and (3) 0-01 + 0-0100 0-03 + 0-0110 _ 


Females/Males 


) Less than 100 


(1 0-90 + 0-0077 0-99 + 0-0088 0-09 + 0-0117 
(2) 100-104 0-77 + 0-0059 0-83 + 0-0065 0-06 + 0-0088 
(3) Over 104 0-79 + 0-0063 0-84 + 0-0066 0-05 + 0-0091 
Differences between (1) and (2) 0-13 + 0-0097 0-16 + 0-0109 
(1) and (3) 0-11 + 0-0099 0-15 + 0-0110 
(2) and (3) 0-02 + 0-0086 0-01 + 0-0093 
1930-2 
Males Females Differences 
Percentage of occupied males 
engaged in agriculture 
(1) Less than 33 % 0-92 + 0-0065 1-00 + 0-0073 0-08 + 0-0098 
(2) 33 to 50 % 0-81 + 0-0059 0-86 + 0-0063 0-05 + 0-0086 
(3) Over 50 % 0-80 + 0-0086 0-89 + 0-0098 0-09 + 0-0130 
Differences between (1) and (2) 0-11 +0-0088 0-14 + 0-0096 
(1) and (3) 0-12 + 0-0108 0-11 + 0-0122 
(2) and (3) 0-01 + 0-0104 0-03 + 0-0117 
Females/Males 
(1) Less than 100 0-93 + 0-0080 1-04 + 0-0094 0-11 + 0-0123 
(2) 100—104 0-81 + 0-0061 0-88 + 0-0068 0-07 + 0-0091 
(3) Over 104 0-84 + 0-0065 0-89 + 0-0069 0-05 + 0-0095 
Differences between (1) and (2) 0-12+ 0-0101 0-16 + 0-0116 
(1) and (3) 0-09 + 0-0103 0-15+0-0117 
(2) and (3) 0-03 + 0-0089 0-01 + 0-0097 


N.B. The figures after the + sign are standard errors. 
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Hence approximately Peer) ea ee re a (2) 
If the hypothesis of chance variation were true, then we should substitute the 
true death-rates into (1). In the present case, however, we know from the 


general consistency of the results and from previous knowledge that the death- 
rates are lower in the rural areas; it seemed therefore better to take 


ee d'(1-—d’ 

Vid‘) = a. Be i etn, =! saga (3) 
giving approximately Br Fes) ge y oe | i ae he eee (4) 
or Tp = VD. 


It will be legitimate to neglect the error in H compared with the error in D, owing 
to the larger populations on which the death-rates in England and Wales are 


based; we find therefore ae 
- 0» == JD, E. 


CONCLUSIONS 

In relation to the general death-rate in England and Wales, the mortality in 
the rural areas during the triennia 1920-2 and 1930-2 was proportionately 
lower for males than females. This was not only true of these aggregated areas, but 
also of the rural areas within the major divisions of the country and was apparent 
at all ages above age 15. The causes of death which largely contributed to this 
were, as far as can be judged, phthisis and cancer. Emigration may be an influen- 
tial factor. The sex ratio (male/female) of the populations in rural areas favoured 
the male. If this fact is accepted as evidence of a greaier exodus of female migrants 
there arises the probability that the residual female pcpulation is the more un- 
healthy. This suggestion seemed to be confirmed by the fact that where the sex 
ratio in the population was highest, the divergence in the ratios of the actual to 
expected deaths for males and females was lowest. 
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Nussey: A Piebald Family 


Teddy P.’s father (IV 3). 





Teddy P. (V 1). 

















A PIEBALD FAMILY 
By A. M. NUSSEY 


PIEBALDs aresufficiently rare to warrant the publication of a hitherto unrecorded 
family. 

Cockayne (1933) gives a full account of this condition in his book under the 
heading of ““ Abnormalities of Pattern’’, and a bibliography will be found there. 
The condition behaves as a dominant and three types are described. 

My piebald family would fall into the subgroup with a white frontal blaze 
and pigmented dorsal stripe, the remaining two varieties being one with no 
frontal blaze and white dorsal stripe, and the other with no white frontal blaze 
and dorsal surface pigmented. 

Only three piebald families have been recorded before in England: the 
London one by Bishop Harman (1909), and two other families described by 
Cockayne (1914, 1935) both of which came from Suffolk. My piebald family is 
domiciled in and around Birmingham, and as far as I could find did not originate 
in Suffolk. Tradition has it that the first piebald in the family was a Frenchman 
who settled in England about 100 years ago. 

Unfortunately the family which I am about to describe showed great 
reluctance to come forward, and as a result this article is not fully documented. 

The only individual whom I was privileged to see and photograph is Teddy P. 
(V1). He is of dark complexion, has dark eyes (no heterochromia) and shows a 
frontal blaze, unpigmented patch of skin in the centre of the forehead (only 
faintly visible in the photograph), a small white’ patch to the right of the 
umbilicus, an extensive patch in the front of the upper part of the left leg, and 
another but smaller patch in the corresponding position of the right leg. 

The boy’s mother assured me that the father (IV 3) and all the other affected 
members of the family show exactly the same markings. The father is very 
sensitive about his white forelock to the extent of keeping his cap on almost 
continuously, but I was able to obtain a photograph of him as a youth showing 
the white blaze. I subsequently saw the father ([V3) and confirmed that the 
distribution of the white blaze and other unpigmented patches is practically 
identical with that of his son. The boy’s grandmother (III3) agreed with the 
details of the tree, and told me that other affected members were also very 
sensitive about the white blaze. An uncle (III 4) went so far as to apply chemicals 
to disguise it and as a result lost his hair in that region. 

[1 (C.) is said to have exhibited the typical markings and to have transmitted 
it to IT1 (P. née C.), but nothing is known about their sibs. 
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II 1 had nine children, five boys (IIIT 2, 4, 6, 8, 10), all of whom were affected, 
and four girls, two of whom (IIT12, 16) escaped. 

Subsequent transmission occurs, as is usual with dominant characters, only 
through affected members whether male or female, and so we see that III12 who 
married twice and had three boys (IV 1, 2, 3) transmitted it to one of them 
(IV 3) and through him to V1. The same happens in the case of II114, [V5 and 
V2; in I116 and IV8; IIT8 and IV9, 11, 12; Ii114, IV15 and V3; and II114, 
IV 19 and V6. 

The total number of members in the five generations is 38, of which 21 were 
affected and 17 were not. It must, however, be taken into account that six of 
the latter ([IV13, IV 14, V4, 7, 8 and 9) could not, according to Mendelian 
laws, have exhibited the abnormality, and so if one deducts aiso the first two 
piebalds (I1, IT 1) about whose sibs nothing is known, this gives a proportion of 
19 piebalds to 11 who did not exhibit this character, which is close enough to the 
expected ratio of 1: 1.* Among these 30 individuals we find: 

(a) 11 affected and 2 unaffected males, 

(6) 8 affected and 9 unaffected females. 

There is thus a considerable preponderance of piebalds among the males. 


In conclusion I should like to express my thanks to Dr Cockayne for his 
helpful criticism. 


REFERENCES 
CocKayng, E. A. (1914). Biometrika, 10, 197. 
- (1933). Inherited Abnormalities of the Skin and its Appendages. Oxford Univ. Press; 
Humphrey Milford. 
— (1935). Biometrika, 27, 1. 
HarMAN, Bishop (1909). Trans. Ophth. Soc. 39, 25. 


* The departure of 19 from the expected value 15 has a standard error of 2-74, and therefore 
cannot be considered significant. 

+ 11 differs from the expected value of 6-5 by 4-5; the appropriate standard error is 1-80 and 
the difference may therefore be significant. 
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A NEW METHOD OF EXPERIMENTAL SAMPLING 
ILLUSTRATED ON CERTAIN NON-NORMAL 
POPULATIONS 


By G. B. HEY 


1. INTRODUCTION 


Tue theoretical distribution of many statistics calculated from small samples 
is known when the population is normal, but when it is not normal we know very 
little about the distribution of such statistics. Such work as has been done has 
generally assumed population forms of standard types, but we may occasionally 
come up against samples from populations which do not appear to fit into any 
known type. This has led to many attempts being made to build up, by experi- 
mental sampling from non-normal data, partial populations of samples from 
which can be inferred in an empirical way the laws of distribution followed by 
derived statistics. A list of papers dealing with this subject which have come 
to the author’s notice is given on pp. 79, 80 below. 

In many cases it has been found that in sampling from curves with one mode 
not at the end of the range, the distributions of statistics such as ‘‘Student’s”’ 
“t”, the correlation coefficient and, in certain cases, Fisher’s “‘z”’, differ very 
slightly from one population to another. On the whole these investigations have 
suggested that in such cases we can neglect the departure of the population from 
normality without introducing serious error into our tests of sigmificance. 

The possibility of further theoretical work must not be overlooked, but unless 
our results are independent of population form (as, for instance, in recent work 
by Pitman and Welch) it is unlikely that we shall be able to make much practical 
use of the results. It is customary to designate a non-normal population by the 
values of 8, and f£,; but in the case of samples of 100 or less from a normal 
population the range of values of 8, and f, excluding 5 % of the total at each end 
is comparable with the range of 8, and f, in the non-normal populations which 
have been used for sampling experiments. Further, this range of populations is 
considered by E. 8. Pearson to cover most cases which will be found to occur in 
practice. On these grounds I think that conclusions of practical value are most 
likely to be reached by further sampling. 

No attempt appears to have been made to carry out an experimental 
sampling from a bivariate population in which the distribution surface is not 
normal and in which the correlation coefficient is high, or to take sets of samples 
from a wnivariate non-normal population and to assign the samples to blocks 
and treatments in a randomized block experiment, taking a completely fresh 
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sample each time. Eden & Yates (1933)| carried out sampling from a set of 32 
values, assigning blocks in a constant alrangement and treatments at random 
within these blocks. Unfortunately their|set of values was “as nearly normal as 
could be expected in a sample of 32” (Neyman, 1935, p. 114). E.S. Pearson (1931a) 
took many samples but with only one form of classification, though he suggested 
the consequences that were likely to follaw in more complex analysis. These two 
papers seem to be the only ones dealing with sampling in the case of Analysis of 
Variance, and neither covers the state off affairs that we are considering here. 

[ have therefore carried out an experjmental sampling from four non-normal 
populations of which three occurred in the course of an agricultural trial. The 
statistics which I have considered are the correlation and regression coefficients 
and the ratio of two independent estimates of variance. Now most sampling 
investigations have been concerned w}th artificial populations such as the 
rectangular, triangular, normal and the various Pearson types, so that the 
mathematical form of the frequency distribution is known. The three populations 
which occurred in practice did not appear to follow any mathematical law of the 
type usually considered, although many attempts at curve fitting were made. 
The populations are similar to one used by E. 8. Pearson (193la,b) but are 
rather more extreme and somewhat irregular. 





2. DESCRIPTION OF THI} EXPERIMENTAL WORK 

Preliminary considerations. Before commencing a practical investigation it 
is necessary to consider the number of samples which must be taken in order 
that we may have a reasonable chance of obtaining information of value. In 
applications to practical work we are usually interested in the tails of our derived 
frequency distributions—those areas at the ends of the range containing 1°, or 
5%, of the total area under the curve. Now the chance of any sample giving a 
value of our statistic which lies within dne of these extreme classes is small, so 
that the number in one of these classes is distributed approximately in the 
Poisson distribution. | 

Suppose that we agree to regard the number in one of these classes as being 
significantly different from expectation! if it, or a more extreme value, should 
occur less than once in 20 times; then if/our expectation is 10 we shall accept all 
frequencies which actually occur if they; lie between 4 and 17; if the expectation 
is 30 the limits are about 20 and 43, andjif it is 50 they are about 37 and 66. This 
means that if we want to be fairly sure of getting an estimate within about 25% 
of its value of the frequency in any class, then the expectation in that class must 
be about 50. We see from this that a sample of 200 (values of our statistic) will 
give no information about the 1% points and little about the 5°, points; a 
sample of 1000 will give little about the! 1% points and a reasonable estimate of 
the 5°, points. Now to take 1000 random samples of size 20, say, from a popu- 
lation, even when using Tippett’s numbers, is a considerable undertaking, and 
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in the case of an analysis of variance with two or more classifications the 
subsequent computing can be very laborious without special machines. I have 
therefore devised methods of doing all this automatically with the use of tabu- 
lating machines, and I believe that the method is new. The essential processes 
are described in a paper on the subject by Comrie, Hey and Hudson (1937). 
In addition to the machines there described I have used the rolling total 
tabulator, the main property of which is that it can transfer numbers from one 
counter to any other, or to any combination of others. It is very convenient for 
the production of the sums of squares and products of small groups of numbers. 
The speed of the machine is great; for instance, it produces n, La, Ly, La? and 
Xay where there are 20 pairs (x, y) in 20 seconds. 
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3. DESCRIPTION OF THE POPULATIONS 


The populations used are shown in Table I and in the figure with the values of 
mean, variance, 8, and 8,. Population I is ungrouped, being the number of ears 
in each of 7200 6-inch single-row lengths of wheat. No. IT is grouped into intervals 
of 1 gram, the observations being the weights of grain on these same 7200 6-inch 
lengths measured to 0-1 gram; the original figures were used in the calculations, 
the grouping being for purposes of description only. No. III is similar to No. II, 
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TABLE I 


Frequency distributions of| the original populations 








£ I II Ti IV 
| | = | 
i 
0 113 245 19 t — 3 ~ — 
l 81 255 8 27 6 9 45 5 
2 163 461 12 D5 5 14 43 4 
3 259 636 17 25 19 42 4 
4 365 743 21 r 6 24 40 3 
5 440 673 28 20 4 29 39 3 
6 56 755 36 33 2 33 37 2 
7 614 679 40 9 2 36 36 2 
8 706 591 47 34 3 39 34 1 
9 664 516 44 31 ] 4] 33 1 
10 655 398 71 is 2 43 31 I 
1] 498 329 61 18 l 45 29 1 
12 51 238 69 12 l 48 28 0 
13 44] 160 93 16 2 49 27 0 
14 332 148 81 14 2 50 25 0 
15 223 98 70 15 0 51 23 0 
16 170 93 63 14 l 52 22 0 
17 131 54 55 14 0 53 20 0 
18 95 34 76 10 l 53 19 0 
19 68 22 72 3 0 4 17 0 
20 29 14 87 7 0 54 16 0 
21 17 14 63 7 l 53 15 0 
22 1] 14 65 ll 0 53 13 0 
23 8 5 64 13 0 52 12 v0 
24 4 10 45 i) 0 52 1] v0 
25 4 5 45 7 l 51 10 0 
26 0 5 55 6 0 50 9 0 
27 0 l 40 6 0 49 8 0 
28 0 2 4] 5 l 48 7 0 
29 l 1 at 34 33 5 1 at 91 47 7 0 
30 l 1 at 38 38 6 1 at 98 46 6 0 


The second and third columns of populations III and IV give frequencies for the groups 
31-60 and x = 61-90 respectively. 


Frequency constants 


Population ... I I] Ill IV 
Mean 9-172 6-838 22-949 26-157 
Variance 18-07 17-78 206-0 203-2 
By 0-089 0-97 1-446 0-219 
B. 3-058 4-87 4-755 2-625 


Grouping interval 1-0 1-0 1-0 1-0 
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but refers to a second year’s experiment, and No. IV is a smooth Pearson Type I 
curve whose equation is 
4 x ) x ) 
y = 53-55 | -— 3 
y (5 60) \20 


The total frequency of Nos. III and IV is 2031, it being considered that a 
population with this total frequency was large enough. 

The 7200 observations in No. III were grouped into groups of 0-3 gram, and 
the numbers in each group reduced in the same ratio. The correlation between the 
variables in populations I and IT, as estimated from 7200 pairs, is 0-712, and the 
coefficient of regression of grain weight on ear number is 0-736, the regression 
being sensibly linear. The corresponding pairs for these were already punched 
on the same cards, and so we are taking samples from a correlation table with 
7200 entries. Populations III and IV were also in pairs on the cards, but were 
entered so as to be uncorrelated. The methods of entering the numbers on to the 
cards will not be described here. 


4. THE CALCULATIONS MADE ON THE TABULATOR AND 


MULTIPLYING PUNCH 
The 7200 cards containing the first two populations were sorted into a random 
order and about 165 sets of 12 counted out by hand. Let us call the two numbers 
on each card F and G. Then the cards were passed through the multiplying 
punch which formed =F? and =FG at one run, and 2G? and = FG at the next run; 
the recurrence of FG provides a check. The tabulator gave the sums =F and 
XG (the summation is over 12 pairs). This sampling was done twice to give in all 
332 samples of 12. The populations III and IV were on a new set of cards and 
from this set samples of 20 were taken, each 20 being replaced before the next 20 
were drawn, until 1008 samples had been taken from each population. A complete 
list of these is given by the tabulator, together with the totals of each set of 20. 

Using the rolling total tabulator we produce twice the sum of squares of the 
numbers in each sample, and at the same time twice the sum of products of the 
two numbers, one from each population, and also the sum of the 20 numbers 
themselves. By this means we have all the totals that we require and several 
checks on the operations of the machine. The sums of squares are all hand- 
punched on to new cards. We must next assign the four imaginary blocks and 
five imaginary treatments to the 20 numbers; this is done by identifying the 
cards with 11, 12, 13, 14, 15; 21, 22, 23, 24, 25; 31, ..., 35; 41, ..., 45. It is now 
possiole with a single run through the tabulator to produce the sums of the 
numbers in groups according to the first or second figure of the identification. 
These sums are referred to as the block and treatment totals, or the sums in 
fours or fives. The totals in fours, fives and twenties are hand-punched on to 
further cards and sorted into groups so that all equal numbers are together, and 
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it is arranged that each group is preceded by a card containing the square of that 
number. This square is transferred by the reproducing punch to each card of the 
following group until the group ends, when the number which is being trans- 
ferred is changed automatically. We have now certain figures available for 
constructing the Analysis of Variance table. Calling the numbers 2;;, (i= 1, 2,3,4; 
j=1, 2, 3, 4, 5), we have: 


bs SS « fl 
x= B; 22 2%,;=G 
3 ee 
, ir ee TV 72. — 8 
2X 2xj=T; 2d x7;=S 
u tj 
and our Analysis of Variance will read: 
Variation 
due to D.F. Sum of squares 
aiza ¢ 1 (¥' R2 1 ¥2 — | ¥v R2 2 
Blocks 3 5 (2B?) —_ 20 7 = 20 [4B —_ G ] 
Tres 3 1 (S72) 1. 2 — 1 [5y72_ Qe 
rreatments 4 }(279)-G = gp [527 — G?] 
Trror ¢ Yi S R2 Wy ik 1 @2— 1 1999 ¥ R2_ ay 72 v2 
Error 12 S—} (2Bj)—} (275) + 35 @=sq (208 — 42 BF — 5275 + G?] 
ToTaL 19 S-A,@ =, [20 S—G@] 


Now we cannot make the tabulator divide; we can make it multiply by two, and 
by repeating this process and combining the contents of counters in various ways 
it is possible to produce all the quantities in square brackets in the table, and 
since we shall be concerned only with the ratios of these quantities, we can 
neglect the factor 20. By feeding the cards containing B}, Tj, G? and S, and by a 
suitable arrangement of counters, the machine produces the table in the form 
shown above in less than five seconds. 
5. SUBSEQUENT CALCULATIONS 

Since the tabulator cannot divide other than, in effect, by multiplying by the 
least common multiple, it is impossible for us to get any further using the 
automatic machines. However, the work which has been done on them has 
resulted in an immense saving of labour on a very dull task. 

First experiment. We set XF? on the levers of a Brunsviga and multiply it 
by 12, and then after subtracting (ZF)*, we have 122 (F—F)?; carrying out 
similar calculations for FG and =G? we produce the correlation and regression 
coefficients very rapidly. The distribution of the correlation coefficient and of 
Fisher’s transformation of this ccefficient are considered; also the distribution 
of the regression coefficient is worked out. In addition to this the sets of samples 
of 12 for population I were combined by taking 6 at random, and from this set of 
72 we can form estimates of the variance within and between the sets of 12. The 
ratio of these estimates, based respectively on 5 and 66 degrees of freedom, is 
distributed in a known form derived from the Incomplete Beta Function when 
the population is normal. Altogether 383 such samples were taken. The results, 
which diverged considerably from expectations and led up to the second experi- 
ment, are examined in the next section, 
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Second experiment. We have three estimates of the variance in the original 
population based on 3, 4 and 12 degrees of freedom, and we know, in the case 
of the original population being normal, the distribution of the ratios of these 
estimates. (Notice that although the third ratio is not independent of the other 
two, the three distributions are independent in the case of normal data.) The 
distributions of these three ratios were drawn up; the manner of doing this was 
to find the 1, 5, 10, 20, 40, 60, 80, 90, 95 and 99 ° points in the case of samples from 
normal data and to count the numbers of samples giving values lying within 
these classes. The theoretical values are calculated from the tables of the 
Incomplete Beta Function by inverse interpolation; a check is given for the two 
end classes from the tables of Fisher’s ‘‘z”’. This interpolation was very difficult 
in places owing to the large tabular interval, and in certain cases the tables had 
to be recomputed at a smaller interval. 

Finally the correlation between the 20 pairs of values from the two popula- 
tions was evaluated for 336 sets of 20, and the distribution of totals of five for 
each population, and its first four moments, produced. We are now in a position 
to discuss the results of these calculations. 


6. DIscUSSION OF THE RESULTS 

First experiment. The observed distribution of the transformed correlation 
coefficient, z=} log, (1+7r)/(1—7r), shown in Table IT, has mean 0-988 after 
allowing for the bias, s.D. = 0-349, 8, = 0-057, and £8, = 3-275, the expected values 
on the basis of normal theory being 0-892, 0-328, 0, 3-22, using the expressions 
given by Fisher (1923). 8, is almost significantly, and the variance and f, in- 
significantly, different from expectation. The difference in means is 0-096 and is 
very significant, its standard error being 0-018. On the whole the distribution 
agrees fairly closely with expectation except for this shift in the mean, the cause 
of which remains doubtful. There is no doubt of the correctness of the value 
0-712 for the correlation as calculated from 7200 pairs, but it is interesting to 
note that, if we combine the observations in pairs according to their position in 
the field, the correlation between the 3600 pairs of totals of grain weight and 
ear number becomes 0-667. In the work done on the experiment in which these 
figures occurred the totals were combined in 32 different ways, and the correla- 
tion was estimated from the totals of larger units and also from the figures for 
6-inch lengths within the larger units, and in the latter case the correlation was 
steady at about 0-76, as estimated from the “‘ within plots” line of the Analysis 
of Covariance, if the number of 6-inch lengths in the plot was more than four. 
If we take this value as being the correlation between the two counts, then our 
sampling experiment agrees closely with expectation, as based on normal theory. 

The distribution of the regression of G on F has one observation far removed 
from the rest, and this has been omitted in forming estimates of the parameters 
of the distribution. (It is about 6¢ from the mean and this is very unlikely in a 
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sample of 332.) The mean is 0-735, s.p.= 0-245, 8,=0-001, 8,=3-51, with ex- 
pectations 0-736, 0-234, 0, 3-86 (K. Pearson, 1926, p. 7). These values agree 
well with expectation. The purpose of the sampling being primarily to test the 
effect of non-normality on the distribution of the correlation coefficient, it was 
felt that 332 samples would give sufficiently accurate information, since we do 


TABLE II 


Sampling distributions in Experiment 1 


Transformed correlation : : ar Regression coefficient of 
a Correlation coefficient : ; . 
coefficient G on F 
Z Frequency r Frequency, Coefficient | Frequency | 
—0-4 to —0-3 l At —0-46 l —0-2 to —0-1 I 
—0-3 0 —0-1-0-0 0 
—0-2 0 At —0-07 I 0-0-0-1 2 
—0-1- l 0-1- 4 
Q-Q-— 0 0-10-0-14 1 0-2- Z 
0-1- 4 0-14 2 0-3- 16 
0-2 l 0-18 l 0-4 22 
0-3 3 0-22- l 0-5 48 
0-4- 1] 0-26 ] 0-6 57 
0-5 14 0-30- 2 0-7 47 
0-6 18 0-34 0 0-8 45 
0-7 39 0-38 6 0-9 44 
0-8 34 0-42 5 1-0 22 
0-9 31 0-46- 5 1-1 11 
1-0 35 0-50 10 1-2 3 
1-1 37 0-54 7 1-3 5 
1-2 30 0-58 17 1-4 0 
1-3 25 6-62 29 1-5-1-6 ] 
]-4- 19 0-66 27 
1-5- ll 0-70 24 = 
1-6 13 0-74 30 2-2-2:3 | 
1-7 4 0-78 40 
1-8-1-9 l 0-82 48 
0-86 40 
0-90 29 
0-94-0-98 5 


Total 332 Total 332 Total 332 


not usually require great accuracy in a measure of association. The observed 
distribution of the transformed r fits a normal curve quite well, and there is no 
evidence of lack of agreement at the tails. All this suggests that considerable 
non-normality in the original distribution will not affect the distributions of 
correlation and regression coefficients even in the case of high correlation. 

The results of the comparison of estimates of variance is to give frequencies 
4, 12, 16, 31, 59, 63, 81, 59, 23, 26, 9, in classes with expectations 3-83, 15-3, 
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19-1, 38-3, 76-6, 76-6, 76-6, 38-3, 19-1, 15-3, 3-83, the total frequency being 383. 
There is a serious excess of observed values at the end of this distribution, but 
since we are using the same sets of 12 several times this may cause a bias, 
and the more extensive second experiment was carried out to examine this 
case more closely. 

Second experiment. The first step is to test the randomness of the sampling 
process, since this is new. Owing to the method used the results come out in nine 
batches of 112 and for each batch separately we have the mean and variance of 
the distribution of totals of fives. These were found to agree well with expectation 
with two exceptions, one of which was due to an oversight in sampling for one 
batch of population IV ; this batch was discarded. One batch from population IIT 
was insufficiently variable, but there was no obvious reason for this, and the 
batch was retained. The first four moments of the distribution of totals of five 
combined from the eight (or nine) batches, together with their expected values, 
taking into account the non-normality, are shown in Table III. 

TABLE III 


Mean Variance By By 
Population IIT: 
Expected 114-7 1030 0-289 3-351 
Observed 114-2 966 0-323 3°347 
Population IV: 
Expected 130-6 1015 0-044 2-925 
Observed 130-7 1027 0-072 2-915 


The variance of population III is significantly smaller than expectation, its 
s.D. being approximately 22. The other values agree closely with expectation, 
using the approximate values of the s.p. of 8, and f,. The s.p. of each of the 
means is about 0-5. 

Further, the correlation between the two values from populations III and IV 
was evaluated for 336 pairs and found to be — 0-0127 with s.p. of approximately 
0-016. The manner in which the two populations were paired is the same as that 
used for the actual sampling, and was intended to produce zero correlation 
between the two variables; we can therefore consider that the adequacy of this 
method of sampling has been demonstrated. 

The observed frequencies of the ratios of variances* in the classes 0-1 °%, 
1-5 %, ete. are given in Table IV, together with the expectations in those 
classes based on normal theory. The agreement in all cases is good, and with one 
exception there is no evidence of serious divergence at the tails. The shortage in 
the 1 % class for population III, degrees of freedom 3 : 12, is significant by itself, 


* Taken from the Analysis of Variance table given on p. 73 above. 






































TABLE IV 
Distribution of ratios of estimates of variances in Experiment 2 
Population III 





| | | 
Observed frequencies 
Class Normal 
0 Degrees of freedom theory 
_ expectation 
3:4 3:12 4:12 
0O- | 6 1 8 10-08 
l- 5 36 38 40 40-32 
5- 10 49 46 59 50-40 
10— 20 78 93 90 100-80 
20- 40 206 209 204 201-60 
40— 60 211 203 212 201-60 
60-— 80 189 199 199 201-60 
80-— 90 114 98 99 100-80 
90- 95 66 56 49 50-40 
95-— 99 4] 52 41 40°32 
99-100 12 13 7 10-08 
Totals 1008 1008 1008 1008-00 
x 11-5 4-57 0-86 
Population IV 
Observed frequencies 
Normal 
Class Degrees of freedom theory 
» A expectation 
3:4 3:12 4:12 
0 ] 12 9 7 8-96 
] 5 43 43 42 35-84 
5- 10 48 4l 41 44-80 
10— 20 97 79 81 89-60 
20— 40 179 169 160 179-20 
40— 60 160 180 164 179-20 
60-— 80 173 190 180 179-20 
80-— 90 88 74 118 89-60 
90— 95 48 44 52 44-80 
95-— 99 35 58 38 35-84 
99-100 13 9 13 8-96 
Totals 896 896 896 896-00 
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but it is the lowest of six values and there docs not appear to be any trend over 
the rest of the range. The six values of x”, each based on a classification into 
5 groups, were evaluated and are 11-5, 4-57, 0-86, 4-81, 1-71 and 13-5. This is 
quite a reasonable set for 4 degrees of freedom. 


7. CONCLUSION 

Samples have been taken from four non-normal populations and the distri- 
butions of correlation coefficients, regression coefficients, and the ratio of different 
estimates of variance corresponding to degrees of freedom 3: 4, 3:12 and 4:12 have 
been found. They all agree sufficiently well with the known distributions in the 
case of normal populations for us to neglect the departure from normality in 
using these tests of significance when the original populations are of the form 
we have used. This agrees with the general conclusions reached by E. 8. Pearson 
in other cases of sampling from non-normal populations. The bias found in the 
first set of ratios of estimated variances may be due to the dependence among 
samples as a result of using the same sets of 12 over and over again. 

Now that we have a method of taking random samples and of doing most 
of the subsequent computing automatically it would be of considerable interest 
to continue the sampling investigations for the further investigation of the ratios 
of estimates of variance in the cases of multiple classification, e.g. the Latin 
Square and multiple factor experiments, and for other forms of population. All 
these can be rapidly carried out with the aid of tabulating machines. 


Finally I wish to thank Dr J. Wishart for his valuable advice and continued 
interest taken in this work; also Mr J. Mandeville of the British Tabulating 
Machine Co., Ltd., and Dr L. J. Comrie for their assistance in connection with the 
parts of the work involving tabulating machines. 
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A NEW MEASURE OF RANK CORRELATION 
By M. G. KENDALL 


1. In psychological work the problem of comparing two different rankings 
of the same set of individuals may be divided into two types. In the first type the 
individuals have a given order A which is objectively defined with reference to 
some quality, and a characteristic question is: if an observer ranks the individuals 
in an order B, does a comparison of B with A suggest that he possesses a reliable 
judgment of the quality, or, alternatively, is it probable that B could have arisen 
by chance? In the second type no objective order is given. Two observers con- 
sider the individuals and rank them in orders A and B. The question now is, are 
these orders sufficiently alike to indicate similarity of taste in the observers, or, 
on the other hand, are A and B incompatible within assigned limits of prob- 
ability? An example of the first type occurs in the familiar experiments wherein 
an observer has to arrange a known set of weights in ascending order of weight; 
the second type would arise if two observers had to rank a set of musical com- 
positions in order of preference. 

The measure of rank correlation proposed in this paper is capable of being 
applied to both problems, which are, in fact, formally very much the same. For 
purposes of simplicity in the exposition it has, however, been thought convenient 
to preserve a distinction between them. 


DEFINITION OF T 
2. Consider a set of individuals, numbered from 1 to 10, whose objective order 
is that of the natural sequence 1, 2, 3, ..., 10, and consider an arbitrary ranking 
such as the following: 


Consider the order of the nine pairs of numbers obtained by taking the first 
number 4, with each succeeding number. The first pair, 4 7, is in the correct 
order (in the sequence of 1, 2, ..., 10), and we therefore allot it a score +1. The 
second pair, 4 2, is in the wrong order and we score —1. The third pair, 4 10, 
scores +1, and so on, the nine scores being 
+1—141-—14141-—1+41+41, totalling +3. 

Consider also the scores of the second number, 7, with its eight succeeding 

numbers. They are 


—1+1-—1-—1+1-—1-—1+1, totalling —2 
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The scores of the third number are 
+14+1+4+1+4+1-—1+1+1, totalling +5. 
Proceeding thus with each number, we have 9 scores, as follows 
+3, -—2, +5, -—-6, +3, 0, -l1, +2, +1. 


‘The total of these scores is + 5. 


Now the maximum score, obtained if the numbers are all in the objective 


order (1, 2, ..., 10), is 45. I therefore define a rank correlation coefficient between 
a variable ranked in the objective order (1, 2, ..., 10) and the variable ranked in 
the order above as 
actual score 5 
T= = +0-11. 


~ maximum possible score 45 
Generally, if there are n individuals, the maximum score, obtained if and 
n(n — 1) 
9° sf 


Denoting the actual score for any given ranking by 2, we may calculate a measure 


only if they are all in objective order is (n—1)+(n—2)+...+1 = 


of the rank correlation between this ranking and the objective ranking by putting 


oy 


dae” Hannes (1) 
TWO SHORT METHODS FOR THE CALCULATION OF T 

3. 7 is calculable more easily than might appear at first sight from the above 

approach. Consider for example the order given above, viz. 
4 ¢ @ W232 © 8 1.8 9 
Ve see that the number | has two numbers on its right and 7 on its left. We 
therefore score + 2—7 = —5, and then strike out the 1, being left with 
4g 2 7 $$ 6 8 6 9 

The number 2 has 6 numbers on its right and two on its left and hence we score 


6—2 = +4. We then strike out the 2 and proceed with the 3 and so on. It will be 
found that the scores obtained are 
-5, +4, +1, +6, -—3, 90, +3, 90, -I. 

The total of these scores is +5, and is equal to 2. 

The above rule is quite general. Its validity will be evident when it is noted 
that instead of taking the first number with each succeeding number and so on, 
as in §2, we consider the pairs contributing to 2 in a different way. Taking the 
number | first, and remembering that all the other numbers are greater than 1, 
we see that any number on the left must contribute — 1 to X and any number on 
the right contributes + 1. When 1 is struck out the procedure remains valid for 2, 
and so on. 
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4. Alternatively, the following procedure may be adopted: 
Considering once again the order 


ea 2 Was 8 sae 


we see that the first number, 4, has on its right 6 numbers which are greater. The 
second number, 7, has on its right 3 numbers which are greater. The third number, 
2, has on its right 6 numbers which are greater; and so on. The numbers so obtained 
—_ se £ €& 2 2: & So 
totalling 25. 

There must, therefore, be 45 — 25 = 20 numbers lying to the right of successive 
numbers in the order which are less than those numbers, and hence 


4 


2» = 25—20 
= +5, as before. 


Generally, if the number obtained by the above method of counting greater 
numbers is k 


5 op n(n — 1) 
2k rs 

In practice, I find this method convenient and rapid. It has, moreover, the 
advantage of providing an independent check; for if the process is repeated 
counting greater numbers which lie éo the left, giving a total of, say, l, 


5. The use of 7 can now be extended to the case where no objective order 
exists. In fact, given two rankings, A and B, of the same set of individuals 7 may 
be defined as the coefficient obtained by regarding one order, A, as an objective 
order. If, for example, the orders are as follows: 


T is given by first rearranging A as an objective order, writing below it the corre- 
sponding member in B, thus 


B’ 4 7 2 10 3 6.8: 2 Ss 9 


and then calculating 7 in the manner of preceding paragraphs. Actually, as will 
be seen below, it is not necessary in any practical calculations to rewrite the 
orders in this way. 

6. It is a notable fact that the same coefficient 7 is reached whichever of the 
two orders, A and B, is rearranged as an objective order 
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Consider again the orders given in the preceding paragraph, namely, 
me 8 Bae ES OS ee ae 
oe ee es oe ee Bt ee Se el ee 
Rearranying B as an objective order we have 


A” a. 2.6.3: 6 eS 7 4 


| rs Ss 6: 3) 6 Fs 9 10 


If we repeat this operation on the A’’ and B” we shall get back to A’ and B’. 
» 


A’, B’ and A”, B” are thus reciprocally related and the permutations B’ and A 
may be said to be conjugate. 


” 


We have to show that 7 is the same when calculated from B’ when A’ is the 
objective ranking as when calculated from A’’ when B”’ is the objective ranking, 
i.e. that X'is the same for two conjugate permutations with regard to an objective 
order 1, 2, ..., n. 

In § 2, the value of X for B’ was ascertained directly, the various items entering 
into the sum being 


+3, —2, +5, -6, +3, 0, -1, +2, +1. 


Consider now the value of X for A’’ obtained by the short method of § 3. 


The sums entering into » will be found to be 


+3, -—2, +5, —6, +3, 0, Ll, 


bo 


i.e. exactly the same as those for B’ obtained by the more direct method; and 
hence »' and 7 are the same in the two cases. 

This result is true in general. If the permutation B’ begins with a number a, 
the contribution to 2, from pairs involving a, will be (n—a,)—(a,—1). In A” 
the ajth number will be | and the contribution to & ,., will also be (n — ay) — (a) — 1), 
in the manner of §3. If the second number in B’ is a, the contribution to 2, will 
be (n—a,)—(a,—1)+1 according to whether a, is greater than a, or not. In A” 
the a,th number will be 2, and the contribution to 2 ,., is also (n—a,)—(a,—1)+1 
according to whether | lies to the left or the right of 2 in A”’, i.e. whether a, is 
greater than a, or not; and so on. 

Thus 2’ and 7 are the same for two conjugate permutations with regard to the 
objective order 1, 2, ..., n. 

7. In practical cases, the value of 7 may be found as follows: 

Write down above the given rankings the objective ranking. In the example 
already considered this would give 
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The number 1 in B has an 8 above it in A. In the objective ranking 8 has two 
numbers to the right and seven to the left. Score, tierefore, — 5 and strike out 
the 8 in the objective ranking. The number 2 in B has a 3 above it in A, and 3 in 
the objective ranking has six numbers to its right (ignoring the number struck 
out) and two to its left, score + 4; and so on, the scores being 

—5, +4, +1, +6, -—3, 0, +3, 0, —-1, 
totalling +5, which is equal to 2. 

8. 7 satisfies certain elementary requirements of a measure of rank correla- 
tion. It is +1 if and only if correspondence between the rankings of A and B is 
perfect. It is —1 if and only if the rankings are exactly inverted. For inter- 
mediate values it appears to provide a satisfactory measure of the correspondence 
between the two rankings. A few examples for n = 10 will give some idea of the 
scale of measurement which it provides (an objective order 1, 2, ..., 10 is taken 


in each case): 


Order T p* 
: 2S 36:52 5-9 +0-11 + 0-14 
i @2 Fee 40 6 +0-56 + 0-64 
7 BE °32 C395. 2: 3 —()-24 —0-37 
S 64 733:2.9 %..4 +(0-02 +0-03 
6 12 346567 8 9 + 0-60 + 0-45 
mw 98 VFeia2s £ Ss — 0-56 —0-76 


In the case where no objective ranking exists 7 measures the closeness of 
correspondence between two given rankings in the sense that it measures how 
accurate either ranking would be if the other were objective. In other words it 
measures the compatibility of two rankings. 

9. For the purpose of measuring correlation between ranks, therefore, 7 

Vi 
' : , n§—n 
appears to compare favourably with p. It is admitted that p can take 


n?=—n . 
values between — 1 and +1, whereas 7 can take only —_— values in the range. 


This does not, however, appear to constitute a serious disadvantage to the 
sensitivity of 7. 

On the other hand, 7 possesses one marked advantage over p, in that it is not 
difficult to find the distribution of values obtained by correlating a given ranking 
with the members of a universe in which all possible rankings occur equally 


* Throughout this paper p means the Spearman coefficient of rank correlation defined by 
6S(d*) 

1— ; 
a" — 2 


where d is a difference in ranks. 
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frequently. It is shown below that the distribution of 7 tends to normality for 
large n, resembling p in this respect; but in fact 7 is surprisingly close to normality 
even for low values of n, whereas the distribution for p has not yet been given, and 
appears to present peculiar features.* 


THE SAMPLING DISTRIBUTION OF 2 


10. To judge the significance of an observed value of 7 or of X in the case 
where an objective order is given, we wish to know whether the value could 
have arisen by chance from a universe in which all the possible rankings of the 
n objects occur an equal number of times. It is, therefore, necessary to consider 
the distribution of 2’ in such a universe. The distribution of 7 may be found at 
once from that of X by dividing the variate values of XY by ca a 


The same distribution may be used to judge the significance of a value of 7 
expressing the compatibility of two rankings. A significantly negative 7, for ex- 
ample, would mean that if one ranking is taken to be objective the other has not, 
as judged by the 7-distribution, arisen by chance from the universe in which all 
possible rankings occur equally frequently; in other words that the two rankings 
are significantly incompatible. 

Consider then the universe of values of 2 obtained from an objective order 
1, 2, 3, ..., n and the n! possible permutations of the first n integers. Let the 
number of values of a given 2 be denoted by w,, ». Consider a given ranking of 
the numbers 1, ..., , and the effect of inserting an additional number (n + 1) in 
the various possible places in this ranking, from the first place (preceding the first 
member of the rank of n) to the last place (following the last member of the 
rank of n). 

Inserting the number (n+ 1) at the beginning will add —~n to the value of ». 
Inserting it between the first and second members will add —(n—2) to ©. In- 
serting it between the second and third will add — (n — 4) to X, and so on. Adding 
the number (n+ 1) at the end will add +n to &. 

It follows that 


nian 2 n+4 


+...4Uy 5 


Units = U . 2. +u s gt Un, ys 


beat an ie ee sae! eae (2) 


n—<s n,o+n* = 
This recursion formula permits of the calculation of the frequency array of Y. 
11. If n = 2, there are two values of X, +1 and —1, i.e. wu, 1= 4%, = 1, 
Us. = 0. From (2) we have 
Us, 5 = Ug sat Ug so tUg stUg siat Ug rig 

* The fact that p tends to normality for large n has recently been proved by Hotelling & Pabst 
(1936). The remarks above on the behaviour of p for low values of n are founded on an expression 
for the sampling distribution of p which will be discussed in a further communication shortly to 
be published. This communication will also deal with the relation between 7 and p. 
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the possible values of Y ranging from —3 to +3. By substituting Y = —3, ...+3 
in the above, we find 
Ugg = 1, Uge=0, Ugy=2, Ugg = 9, 


and similar values for the negative values of 2. 
Applying equation (2) again we find 


Usg=1, Uy5=90, Uy=3, Ws = 9, 
Ugo = 5, Uy, = 0, Uy = 6, ete. 


The successive arrays of / may in fact be built up by the following process: 


eae 
fame | 
‘72 
bs ee 
er ee a 
ae Ge 
a ee 


Boge BS oe 
etc. 

At each stage, to find the array for (n+ 1) we write down the n-array (n+ 1) 
times, one under the other and moving one place to the right each time, and then 
sum the (n+ 1) arrays. If the total array has a central value, that value js the 
frequency for X' = 0, and all values of Y must be even. If the total array has two 
central values, these values are the frequencies for © = + 1, and all values of Y 
must be odd. 

12. The above procedure may be condensed by forming a kind of figurate 


triangle as follows: 


l i 

2 ic a 

3 ie ee 1 

4 bos 6 66 C66 3 1 

5 Ll €@ 89 6 DB 2m to Vv 2 3 
etc. etc. 


In this array, a number in the rth row is the sum of the number immediately 
above it and the (r— 1) numbers to the left of that number. The formation of the 
array is quite simple and several devices shorten the arithmetic. For instance, in 
part of the array towards the left a number in the rth row is the sum of the number 
immediately above it and the number immediately to the left. A check is provided 
by the fact that the total in the rth row is r!. 

The following table shows the frequency distribution of Y for values of n 
from 1 to 10. 
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The frequency polygon of the distribution is quite close to normality even 

for n = 6. For n = 10 the correspondence is very good over the material part 

of the range, as may be judged roughly by drawing the frequency polygon to Y 

and the normal curve with the same area and standard deviation. On an ordinary 
scale the two curves are hardly distinguishable by the eye above XY = 5. 


STANDARD ERROR OF T 


13. A little consideration of the above method of obtaining the frequency 
distribution of Y will show that the distribution may be arrayed by the function: 


f = (x 1 + x) (a ey ] . x?) (a 34 x 1 +4423) is 
(ee —D 4g n—-S) + gh gD)... (3) 


The coefficient of 2* in f is the frequency of X in the distribution. 
If we differentiate f with respect to x and then multiply by 2 the coefficient of 


7 © . . , = . 7 c 
«* is multiplied by X. Writing then @ for the operator x — we have 
Cx 


Holy = (Of), 1> 


and generally itt, e IT RIL See (4) 
Applying equation (4) when r = 2, I find 
ts n(n —1)(2n-4 =) eo (5) 


Is 


and hence the standard error of 7 is given by 


3, n(n—1) 


which, as n becomes large, gives 
» ] 
Cv. ~ an a = ie ie (7) 


3 Jn 

Table II shows the proportion of the total frequencies falling outside 
ranges +a, +20, +3o for some of the distributions of Table I. 

The expected values on the hypothesis of a normal distribution are 0-3173, 
0-0455, 0-0027 and it is clear that for most practical purposes in testing the 
significance of an observed 7 for n = 10 or greater, the standard error may be 
used in the ordinary way. 


14. Applying equation (4) when r = 4, I find 
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TABLE II 
Proportion of frequencies of the distribution of X falling in certain ranges i 
' 


Proportion falling outside range 


n Te 
z 
+o +2c +3oa 
6 5-32 0-272 0-056 0-0000 
7 6-66 0-381 0-030 0-0004 
8 8-08 0-275 0-031 0-0004 
| 9 9-59 0-359 0-045 0-0009 
10 11-18 0-291 0-047 0-0009 | 


From this £, may be obtained and it is evident that as n becomes large /, 
tends to the value 3. In fact it remains below that value, so that the distribution 
of X and therefore of 7 is slightly platykurtic. The following table shows the 
values of #, for some values of n. The corresponding values of , for the distribu- 
tion of p are also given and it will be observed that, as judged by 4, the approach 
of 7 to normality is appreciably quicker than that of p. 


TABLE III 


Values of B, in the distribution of X and of p for certain values of n } 
n B.(~’) Bf p) | 

5 2-53 2-07 

10 2-78 2-54 

20 2-89 2-77 

30 2-93 2-85 


In general, as will be seen below, the moment of order 2s is a polynomial of 


degree n**, 


PROOF OF THE NORMALITY OF T FOR LARGE ” 


15. We shall prove that as n> « 
2s)! 
fos ~ 289! (#9)*, 
where //,, is the 2sth moment of the distribution of 2. In virtue of the symmetry 
of the distribution moments of odd order vanish and it follows from the Second 
Limit Theorem of Probability (see Fréchet & Shohat, 1931) that the distribution 








——a 








M. G. KenpDALh 91 


of 2, and hence that of 7, tends to normality in the sense that the frequency 
between 7, and 7, tends to 
e dx. 


~ 7 


l Pr, = 

o ./(27) 

16. Consider the effect of operating on the product f (equation (3)) by 
6=x — The first operation will result in a sum of terms of type 


{— Se (r— 2)2 (r—2) ae (r— 2) att 24 rar 


multiplied by the remaining terms of f unchanged. When x is put equal to unity 
we may write this as the sum of terms 








-r—(r—2)—...4(r—2) 47 | —r—(r—2)—...4+ (7-2) 47, 
n! a ——— n!}. 
i+... $041 r 
Similarly the second operation will bring out terms like 
r+ (r—2)?4+...4(r—2)?+7? ; 
wt. 
- 
-7—(r—2)—...+¢-—2) 4-9) (-?-@-8)—1.. +8 
and n! | \ Pest \ 
| ’ } | t 
Generally, operating 2s times will bring out terms like 
(rs + (r — 2) 4 ...4 (r—2)8% 4775) 
7. 
r } 
jf —r28-2 — (r — 2)... + (nr — 2)8-1 4 rs) 
it 
r 
{—t—(¢—2)—...+¢—2)+2) 


ete. 

When z is put equal to unity any term beginning with an odd superscript in 
the powers will vanish. Consider now the sum of terms like 
Pray eee. NES (9) 
\ t ee | u 
containing s factors. 

It will be proved below that this term contributes the greatest power of n to 
the total sum giving /Wg /ta,. 

Further, in virtue of the multinomial form of Leibniz’ theorem, the factor 
by which this term is multiplied in the expansion of (6*5/) is 
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Hence, since 4, = n! we have 


(2s)! . 
Hg,~ 5, {sum of terms like (9)}. 


Each term in (9) is of type 


: fr2 + (r—2)?+...4+(r—2)? +77}, 


We reesee “i : , : . ; 
i.e. is of order 3° The summation will therefore tend to the sum of terms like 


e 
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.8"}, each term containing s squares of the numbers 1, 2, ..., (n—1). 


Call this 7 


as 


r Me : ; 
Then //, is —~ times the sum of terms in 
s! 


which contain s different factors. 


3s 
> ; . n a 
Now (11) is of order 9s (#.)*. Hence if the product term //, tends to the 
sum (11) to)® 
1, ~ eat 


! 
and in virtue of (10) 


To complete the demonstration, we have therefore to show that (11) tends 
asymptotically to the sum of its terms s! /7,, ie. that sums of terms like 
E+ 23. ....(e—1, 18.2%... .(e—2) 
tend in comparison to zero. 
This may be shown inductively. 
Consider first of all 


{]2+4 224 ...4+(m—1)?? = 277,+4 144 24+...4+(n—1)*. 


6 
The expression on the left ~ a But the sum of fourth powers on the right ~ _ 
which is of lower order. Hence the sum on the right ~2//,. Multiplying by 
{17+ 2?+...+(n—1)} we have 
(124 224 ... 4 (m—1)}3~ 277, {124+ 22+... +(n—1)% 
~ 6/7, + terms of type 12?. 
These terms will be less in sum than 


2{12+ 22+ ...4+(m— 1)? {144 24+... 4+ (n—1)4}, 
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n> n° A : 
3° 5? of order 8. But the expression on the left is of order 9. Hence 
{12+ 22+ ...+(n—1)*}8 ~ 6/7, and so on. 

We can now justify the assertion that the maximum power of n arises from 
terms like (12. 2?. ... .s?). In fact, by a similar line of reasoning to that just given 
it will be seen that sums of terms of type {1*.2?.... 
order. 


which ~2 


.(s—1)*}, ete. are of lower 


The demonstration is complete. 
i7. It appears therefore that the coefficient 7 has a good claim to serious 
consideration as a measure of rank correlation. It is easily calculable. In the 
important case of the distribution wherein all possible rankings occur equally 
frequently its standard error is known; for the values of n likely to be required in 
practice it may be taken to be normally distributed; and where there is doubt 
the distribution can be obtained in an exact form. 

[t should also be remarked that 7 has a natural significance. An observer who 
is given a set of objects (such as coloured discs) to rank appears to follow a process 
something like this: First of all he searches for the beginning of the series, say the 
dise of lightest shade. Having selected a disc, he compares it with each of the 
of his choice. The coefficient 7 gives him one 
is made correctly, and subtracts a mark for 
each error.* When the first disc is selected, he proceeds as before with a second; 
and so on. 7 follows this process exactly. It appears to be a logical measure of 
ranking carried out by the process and should therefore prove useful in psycho- 


logical work. 


Q 


remainder to verify the propriety 
mark for each comparison which 


REFERENCES 
FrEcHET, M. & SHOHAT, J. (1931). ‘“‘A proof of the generalised second limit theorem in the 
theory of probability.”” Trans. Amer. Math. Soc. 33, 533. 
Horettinc & Passt (1936). ‘‘Rank correlation and tests of significance involving no 
assumptions of normality.”” Ann. Math. Statist. 7, 29. 


* Inasmuch as comparisons between extremes in the series will generally be easier than com- 
parisons between neighbouring members it might in some cases be preferable: to weight the marking 
given to different comparisons according to some selected scale. The determination of such a scale, 
however, would depend to some extent on the circumstances of individual cases and would present 
considerable difficulty where no objective order is known to exist, apart from adding greatly to 
the complexity of the distribution of the measure so obtained. 











INDIAN RACES IN THE UNITED STATES. 
A SURVEY OF PREVIOUSLY PUBLISHED 
CRANIAL MEASUREMENTS* 


By GERHARDT VON BONIN anp G. M. MORANT 
1. IyrTRODUCTION 


To the physical anthropologist the American aborigines present some most 
interesting problems. Are they a homogeneous population or do they show racial 
divergences similar to those found for the populations of other continents? 
For how long have they inhabited the New World, and how did they migrate 
into it originally? Answers to these questions should give us not only a sound 
knowledge of the American Indians, but should also afford a further insight 
into human evolution. 

The present paper is intended as but a modest contribution towards the 
solution of such problems. It represents a statistical discussion of the existing 
craniological material, with the object in view of arriving at a racial classification 
similar to those already given for the greater part of the Old World and for 
Australia (Kitson, 1931; Morant, 1925, 1927, 1928; Woo & Morant, 1932). It is 
further restricted in its scope to the area of the United States. Treatment at the 
same time of data for Canadian Indian peoples would have been convenient, but 
in fact there is no suitable material available for them. 

Almost a century ago, Morton (1839) published measurements on 147 American 
Indian skulls of adults. In his preface he remarks that his ample material had 
enabled him “to give a full exposition of a subject which was long involved in 
doubt and controversy’’. Unfortunately, his craniometric technique has now 
become obsolete. The next notable contribution to the subject was published in 
1892 when Virchow provided descriptions of twenty-eight skulls, from different 
tribes, including artificially deformed specimens. Clearly, his data do not lend 
themselves to a statistical treatment. 

In recent years a much larger amount of material has been published by 
Hrdlitka, by Gifford and by Hooton. It is possible to attempt a racial classifica- 
tion on this basis, although even with as many as 1167 undeformed male skulls 
it can only be of a provisional nature. Far more evidence would be required for 
the “full exposition’’ which Morton hed in mind. 


* Joint contribution from the Department of Anatomy, University of Illinois, Chicago, and 
from the Galton Laboratory, University College, London. 
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2. THE SOURCES OF THE MATERIAL AND THE METHODS OF TREATING IT EMPLOYED 


GERHARDT VON BONIN AND G. M. Morant 


The measurements of crania of Indians of the United States to be discussed 
were taken from the following publications: 

(a) Ale’ Hrdliéka, “‘The Anthropology of Florida”, Publications of the Florida 
State Historical Society, No. 1 (1922), pp. 140. Eight absolute and eight indicial 
measurements are given where possible for each skull. 

(b) Edward Winslow Gifford, ‘Californian Anthropometry”, Univ. Calif. 
Publ. Amer. Archaeol. Ethn. 22 (1926), 217-390. Individual measurement taken 
by several anthropologists are given, and the number of characters recorded is 
not the same for all the series. For the best described skulls fourteen absolute 
measurements (including the heights and breadths of both orbits) and eight 
indicial measurements are given. Nothing is said about the techniques followed 
by the different observers, but these are apparently considered to be identical 
for all practical purposes, at any rate, and to give results directly comparable 
with Hrdlicka’s. 

(c) AleS Hrdlitka, ‘‘ Catalogue of Human Crania in the United States National 
Museum Collections. The Algonkin and Related Iroquois; Siouan, Caddoan, 
Salish and Sahaptin, Shoshonean, and Californian Indians”, Proc. U.S. Nat. 

Mus. 69, Art. 5 (1927), 1-127. There are eleven absolute and seven indicial 
measurements recorded in this part of the Catalogue. 

(d) Ales Hrdliéka, ‘‘ Catalogue of Human Crania in the United States National 
Museum Collections. Pueblos. Southern Utah Basket-Makers. Navaho”. Proc. 
U.S. Nat. Mus. 78, Art 2 (1931), 1-95. The measurements given comprise all 
those in the 1927 part of the Catalogue together with seven other chords, three 
other indices, two angles and one measurement of the mandible. 

(e) Earnest Albert Hooton, The Indians of Pecos Pueblo. A Study of their 
Skeletal Remains, New Haven (1930), pp. xxvii+ 391. It is said to be improbable 
that any of the skeletons reporte on are much more than 1000 years old, and 
they cover a period extending do n to the early nineteenth century. Individual 
measurements are not given, but means and standard deviations are recorded 
for a number of groups. The characters treated include nearly all in Hrdliéka’s 
tables and a number of others which are not available for any other North 
American series. 

No adequate definitions of the measurements recorded are given in any of 
the above sources, though Dr Hrdliéka (1919) has elsewhere described his tech- 
nique in detail.* It is based on the International (Monaco) Agreement of 1906, 

* The following symbols are used to denote measurements in tables below: C=capacity, 
L=maximum glabella-occipital length, B=maximum calvarial breadth, H’=basio-bregmatic 
height, LB=chord nasion to basion, GZ=chord basion to alveolar point, G’H =chord nasion to 
alveolar point, J =maximum bizygomatic breadth, N B= maximum breadth of pyriform aperture, 
NH =chord from nasion to subnasal point, 0,’ (R or L)=orbital breadth from dacryon, and O, 
(R or L)=orbital height perpendicular to O,’. NZ, AZ, and BZ are the angles of the fundamental 
triangle of which the apices are the nasion, alveolar point and basion. 
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but some modifications are introduced. It was to be expected that the measure- 
ments of the other American observers were taken by following either Hrdlitka’s 
instructions or those of the Monaco scheme. No reason to question this assump- 
tion was found except in the case of the orbital breadths recorded in Gifford’s 
paper. It is shown to be extremely probable (see pp. 99-101 below) that these were 
not obtained in accordance with Hrdlitka’s definition, and hence the means of the 
orbital breadths and indices for Gifford’s series were omitted in making com- 
parisons with others. Hrdlitka’s definitions are discussed (Morant, 1937, pp. 2-4), 


in a paper dealing with his Eskimo material. Considerably more than half of 


the skulls treated below were measured by him and his assistants. A few of his 


Californian series may be supposed to represent the same populations as a few of 


Gifford’s, but otherwise there is no duplication of this kind. It is to be regretted 
that the craniologists cited failed to record a number of customary measurements. 
There are no arcs available for any of the series except Prof. Hooton’s Pecos 
Pueblo, and this is unfortunately found to be unsuitable for comparative 
purposes. 

Karl Pearson’s method of the coefficient of racial likeness is used in the 
treatment of the material given below, both in considering how suitable series 
can best be made up, and in estimating the resemblances of the types defined by 
the series finally selected.* This method has recently been criticized with little 
regard to the fact that its limitations and imperfections were fully recognized by 
its inventor, or to the way in which it has been used in practice for more than 
10 years. For practical purposes, the crude coefficient of racial likeness remains 
still the best means to estimate whether two samples may be considered to 
represent the same population or not, and the reduced coefficient remains an 
effective criterion of the presence or absence of a racial bond between two differ- 


entiated samples. Past experience gives no reason to believe that the method of 


the coefficient of racial likeness fails to provide close approximations to the 
results which could be obtained by applying theoretically more correct formulae, 
such as those taking into account all the intercorrelations of the measurements 
used but which have the disadvantage of involving many times as much arith- 
metical labour. In the present case it is unfortunate that the available data are 
so limited that coefficients can be computed from only 11 to 18 characters 
instead of from 31 as has been done in the past whenever possible. The desirability 
of using this large number of characters has repeatedly been pointed out. But the 
method of the coefficient of racial likeness—being admittedly a ‘“‘stop-gap”’ 

is not a simple rule of thumb. The way in which its values have to be interpreted 
in order to yield useful results has to be determined empirically. It is precisely 
this point that the present paper will throw into relief as we shall see subsequently. 
In calculating all the coefficients the standard deviations of the long Egyptian # 


* The formulae used in practice to compute the crude and reduced coefficients are given by 
Cleaver (1937, pp. 100, 102). 
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(see Pearson & Davin, 1924) series were used, and it is shown in section 8 that 
these are remarkably close to the average standard deviations for the American 
Indian series. 


3. COMPARISONS OF SERIES OF MALE CRANIA OF CALIFORNIAN INDIANS 


The Californian data will be treated first as they are more adequate than those 
for any other small group of North American Indians. In the 1927 section of his 
Catalogue (pp. 102-25) Hrdlitka gives individual measurements of 200 male 
Californian crania. They are divided into ten series on a geographical basis, and 
on account of their small sizes no use has been made of six of these series in the 
present paper. The means for the remaining four are in Table I below,* and the 
reduced coefficients of racial likeness between them are in Table II. The localities 
from which the material was obtained are indicated approximately on the map 
(Fig. 1). 

The two mainland series are clearly differentiated from one another and from 
the two island series, but these last give a negative coefficient. The fact that they 
cannot be distinguished is not surprising, as the islands are only five miles apart. 
The identity in type of the two populations cannot be considered well established, 
however, in view of the small size of the Santa Rosa sample. The means for the 
combined series from the two islands were calculated and these lead to the reduced 
coefficients given in the last column of Table II. The neighbouring island and 
mainland (Santa Barbara County) series are seen to be very similar in type, while 
the San Francisco series bears a closer resemblance to the Santa Barbara than to 


the island series. There is thus a correspondence between the resemblances of 


these types and the geographical positions of the populations they represent. 

In his 1926 paper Gifford gives individual measurements taken by himself and 
other anthropologists of series of Californian Indian crania preserved in several 
museums, and these are all different from the specimens which appear in Hrdli¢ka’s 
Catalogue. It is presumed that the definitions used by all the observers accord 
with Hrdlitka’s, and comparisons between his readings and those of Kroeber 
and Sand on the same twenty specimens are given. These only relate to 9 cha- 
racters, not including orbital measurements, and a fairly satisfactory agreement 
is indicated. 

Gifford’s material is divided up into a large number of small series representing 
subdivisions of counties and covering the whole of the state. In the majority of 
cases grouping of these is necessary in order to obtain samples large enough for 
statistical purposes. The following arrangement was adopted: 

(a) Northern counties: Gifford’s areas 1b, 2a, 3, 6e, 6f, 7d, 16a, 16b, 16c, 17a, 
17b, 17c, 18a, 18¢ and 18d. 


(b) Costanoan people: Gifford’s areas 19a, 19b, 19c and .19f. This is roughly 


* The four series in question are those indicated as measured by Hrdlitka alone. 
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TABLE II 
Reduced coefficients of racial likeness for male series of 
Californian crania measured by Hrdlicka* 


i 


i 


San Francisco Santa Santa Santa Santa Rosa 
Bay and Barbara Rosa Cruz and Santa 
vicinity County Island Island Cruz Islands 
n 26-3 43-7 19-7 64-3 84-1 
San Francisco Bay and vicinity 8-25 +0-75 8-56+1-09 | 12-°94+0-66 | 13-96+0-61 
Santa Barbara County 8-25+0-75 3°38 +0-91 4-57 +0-47 4-43 + 0-43 | 
Santa Rosa Island 8-56+1-09 | 3-38+0-91 — 1-26+0-827 . 
| Santa Cruz Island 12-94+0-66 | 4-57+0-47 |—1-26+0-82+ 


* All the coefficients in this table are based on the same fifteen characters (see Table I). 
+ The crude coefficient corresponding to this is — 0-38 +0-25. 


the same area as that from which Hrdli¢ka’s “San Francisco Bay and vicinity” 
series was obtained. 

(c) Yokut people: Gifford’s areas 20a, 20b, 20e and 20g; Central California, 

(d) Santa Rosa Island. 

(e) Santa Cruz Island. 

(f) Santa Catalina, San Clemente and San Nicolas Islands. 

Male means for series made up in this way are given in Table I.* Gifford also 
gives measurements for other small series, but these come from scattered localities, 
and it was felt that pooling of them to give sufficiently large samples would not 
be justified. 

Comparisons may be made first between the pairs of series made up from 
Hrdlitka’s and Gifford’s data which may be supposed to represent the same 
populations. The “‘San Francisco Bay and vicinity” series of the former corre- 
sponds with the Costanoan of the latter, and the crude coefficient of racial 
likeness between them for 11 characters is 0-87 + 0-29. It should be noted that 
no comparisons of orbital measurements are included as no means for these are 
available for the Costanoan series. The absence of any evidence of differentiation 
is very satisfactory, and there can be no objection to pooling the two to form a 
longer series. The other corresponding groups relate to Santa Rosa and Santa 
Cruz Islands. A comparison of the means for these shows at once that the orbital 
breadths (0; R and O} ZL) and index for Gifford’s Santa Cruz series differ very 
significantly from those for Hrdlitka’s Santa Cruz and Santa Rosa series. This 


obviously suggests that two different definitions were followed in finding the 


* They are the six indicated as measured by Gifford alone. 
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orbital breadths. No comparison of this character could be made in the case of 
the San Francisco series, but it may be noted that the means for Gifford’s Santa 
Catalina, San Clemente and San Nicolas Islands series are also markedly greater 
than all Hrdlitka’s. The latter defines his to be the dacrv1 breadth and his values 
were accepted, while all orbital breadths and indices give 1 by Gifford were omitted 
in computing coefficients of racial likeness. These consiants for the four island 
series are given in Table III. Hrdlitka’s Santa Rosa series is by far the shortest 
of the four and all the lowest coefficients are found with it. Two of them differ 


TABLE III 
Crude coefficients of racial likeness for series of Californian 


crania from Santa Rosa and Santa Cruz Islands 


Santa Rosa* Santa Cruzt Santa Rosat Santa Cruz§ 
(Hrdlitka) (Hrdlitka) (Gifford) (Gifford) 
Santa Rosa (Hrdlitka) * —0-38 + 0-25 (15) | 0-43 +0-29 (11) | 1-58 + 0-26 (13) 
Santa Cruz (Hrdlitka)+ | —0-38 + 0-25 (15) 1-71+0-29 (11) | 4-98 + 0-26 (13) 
Santa Rosa (Gifford)} 0-43 + 0-29 (11) 1-71 +0-29 (11) 2-35 + 0-25 (14) 
Santa Cruz (Gifford)§ 1-58 + 0-26 (13) 4-98 + 0-26 (13) | 2-35+0-25 (14) 


i=19-7 for 15 characters and 19-5 for 11 and 13. 
1=64-3 for 15 characters, 64:7 for 11 and 64-2 for 13. 
n =53-2 for 11 characters and 52-4 for 14. 

13 characters and 52-6 for 14. 


5 
5 


F9.F far 
vt 20'o TOI 


Shr + 


insignificantly from zero, while the third is significantly different from zero. The 
remaining three series are approximately equal in length, so their coefficients 
with one another may be supposed to measure the resemblances of the types, as 
in the case of reduced coefficients of racial likeness. Some curious relationships 
are found. Hrdli¢ka’s Santa Cruz and Gifford’s Santa Rosa series are seen to 
resemble one another more closely than either resembles Gifford’s Santa Cruz 
series. An examination of the mean measurements in Table I shows that the type 
of the last is so distinguished on account of its smaller size. Still omitting the 
orbital breadth, every one of its means for absolute measurements is less than the 
corresponding value for Hrdliéka’s series from the same island. The divergence 
of the two Santa Cruz series is thus probably due to sexing and not to differences 
in the ways the measurements were taken, or to the fact that two different 
populations are represented. For Hrdli¢ka’s combined series from Santa Cruz 
and Santa Rosa Islands and Gifford’s combined series from the same two islands 
a crude coefficient of 4-50 + 0-26 is found for 13 characters. All the means of the 
absolute measurements for the former exceed those for the latter, and by far the 
most significant difference is for the capacity (« = 32-9). In all these comparisons 
no significant differences are found for the indices. It is certainly curious that 
Gifford’s Santa Cruz series should be distinguished by the smaller size of its type, 
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not only from both Hrdliéka’s but also from Gifford’s Santa Rosa series. The 
hypothesis that differences in sexing are responsible for these relationships seems 
to be a plausible one, and accordingly all four series were pooled for comparisons 
with others in the hope that the resulting means (given in the penultimate column 
of Table I) give as fair a representation of the male type of a homogeneous 
population as any which could be obtained from the data available. 

Having carried out the grouping described, six series of male Californian 
Indian crania were made up from the measurements provided by Hrdlitka and 


Gifford. The means for these are given in Table I and the reduced coefficients of 


racial likeness between them in Table IV. It should be noted that these last 
constants are based on differing numbers of characters ranging from 11 to 18. 
In all cases as many of the 31 characters (used when possible in calculating the 


coefficients) as are available were employed, and it is to be regretted that some of 


the numbers fall far short of this total. The difference between a reduced coeffi- 
cient based on 20 characters and another for the same two sets of means based on 
30 characters is generally found to be of little account, but a change from 11 to 
18 characters is likely to be of far more consequence. The reduced coefficients in 
Table V were calculated with the object of gaining some idea of the effects that 
such a change may have. They are between the pooled series from Santa Cruz and 
Santa Rosa Islands and the five others. The second column gives values calculated 
only for the 11 characters common to all six series, and the third for those obtained 
when all possible characters are used in each case. The latter values are all less 
than the corresponding former ones, owing to the fact that the characters added 
tend to show less significant differences than the others, on the average. The 
reduction is only marked in one case, however, and the two sets arrange the five 
series in the same order. The use of differing numbers of characters for different 
coefficients is far from satisfactory, but if comparisons are to be made with data 
for other continental areas, it appears better to use all of the selected list available 
in each case, rather than a constant number considerably smaller than those which 
it has generally been possible to use. 

The connections between the series provided by the lowest orders of reduced 
coefficients of racial likeness are shown in Fig. 1. There appears to be a fairly close 
association between the relationships of the types and their geographical positions 
in the case of five of the series, but the remaining one—from Santa Catalina, San 
Clemente and San Nicolas [slands—is widely removed from the others. A com- 
parison of the means shows at once that the last type has high coefficients with 
all the others chiefly on account of its greater calvarial length and lower cephalic 
and height-length indices. Even if it be excluded, the Catifornian types show 
greater diversity than is generally found for adjoining populations inhabiting a 
small region. In particular the neighbouring Costanoan and Yokut groups are 
far less similar than might have been expected. 


Hrdli¢ka, who had not measured any material from the southern islands, 
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TABLE IV 


Reduced coefficients of racial likeness for male series of Californian 


crania measured by Hrdlicka (H.) and Gifford (G.)* 


Measured by 


Northern California 

Central California 

San Francisco Bay and vicinity 
Santa Barbara County 

Santa Cruz and Santa Rosa Islands 


Santa Catalina, San Clemente and San Nicolas Islands 


Measured by 


Northern California 

Central California 

San Francisco Bay and vicinity 

Santa Barbara County 

Santa Cruz and Santa Rosa Islands 

Santa Catalina, San Clemente and San Nicolas Islands 


Northern 
California 


48-17 


7-89 +0-57 (14) 
12-35 + 0-39 (14) 
1-47 + 0-61 (11) 
21-30 + 0-34 (14) 
92-08 + 0-61 (14) 


— 


Santa 
Barbara 
County 


13-47 +0-61 (11) 
31-37 + 0-66 (11) 
7°85 + 0-42 (15) 


5-56 +0-35 (15) 
70-10 + 0-67 (13) 


Central 
California 
(Yokuts) 


41-67 


7-89 + 0-57 (14) 


18-43 + 0-44 (14) 
31-37 + 0-66 (11) 
37-15 +0-38 (14) 
94-03 + 0-65 (14) 


Santa Cruz 
and Santa 
Rosa Islands 


H. and G. 


157-8* 


21-30 + 0-34 (14) 
37-15 +0-38 (14) 
25-00 + 0-21 (18) 
5-56 + 0-35 (15) 


56-53 + 0-41 (16) 
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San Francisco 
Bay and vicinity 
(including 
Costanoan) 


H. and G. 


82-7§ 


12-35 +0-39 (14) 
18-43 + 0-44 (14) 
7-85 + 0-42 (15) 
25-00 + 0-21 (18) 
52-39 + 0-47 (16) 


Santa Catalina, 
San Clemente 
and San Nicolas 
Islands 


35-4** 


92-08 + 0-61 (14) 
94-03 + 0-65 (14) 
52-39 + 0-47 (16) 
70-10 + 0-67 (13) 
56-53 + 0-41 (16) 


* The n’s are the mean numbers of skulls for all the coefficients of racial likeness characters available. For some 


comparisons it is not possible to use all these characters which are available and the ”’s in these cases are given in the 


footnotes below. 
+ n=49-1 for 11 characters, omitting LB, N 


+ 7=42-2 for 11 characters, omitting ZB, NZ and A 


§ 7=99-0 for 14 characters, omitting C, O,’, O, and 100 O,/O,’; 87-1 for 15 characters, omitting LB, N. 


89-7 for 16 characters, omitting O,’ and 100 O,/0,’. 


and AZ. 


and A/; 


n=45-0 for 11 characters, omitting C, 0,’, O, and 100 O,/0,’; 43-6 for 13 characters, omitting C,’ and 100 0,/0,’. 


| 7=171-4 for 14 characters, omitting C, O,’, O, and 100 O,/0,’; 169-5 for 15 characters, omitting LB, N. 


166-8 for 16 characters, omitting O,’ and 100 O,/0,’. 
** 7 = 36-2 for 13 characters, omitting LB, N. 


_and A, 


and A/ ; 36-8 for 14 characters, omitting C and O,. 








104 Indian Races in the United States 


TABLE V 


Reduced coefficients of racial likeness based on different sets of 
characters: male series of Californian crania 


Reduced coefficients for 
Santa Cruz and Santa = 
Ronn Islands with ... Additional characters 
11 characters* All available 
characters 


Northern California 24-97 + 0-37 21-30 + 0-34 (14) LB, NZ, A 

Central California 37-76 + 0-42 37-15 +0-38 (14) LB, NZ, A 

San Francisco Bay and 37-47+0-21 25-00 + 0-21 (18) C, LB, O,’, O., NZ, A 
vicinity 100 0,/0,' 

Santa Barbara County 6-20 + 0-40 5-56 + 0-35 (15) C, O,', Og, 100 04/0, 

Santa Catalina, San 64-78 +0-45 56-53 + 0-41 (16) C, LB, O., NZ, A 


Clemente and San 


Nicolas Islands 
* Viz. L, B, H’, J, NH, NB, G’H, 100 B/L, 100 H’/L, 100 B/H’, 100 NB/NH. 


concluded that ‘“‘the material from California shows considerable uniformity ”’. 
Gifford distinguished three main living types, one of which was divided into three 
subtypes, and seven cranial types. The skull measurements do not appear to 
justify such an arrangement. The one adopted in this paper has a geographical 
basis, and it should be pointed out that far more adequate material would be 
required to delimit accurately the groups distinguished in such a way. The groups 
used here are obviously of provisional value only. 


4. COMPARISONS OF SERIES OF MALE CRANIA OF 


ALGONKIN AND RELATED INDIANS 


Another group of material which can be conveniently considered by itself is 
provided by the Algonkin and Iroquois series for which measurements are 
provided by Hrdlitka in the 1927 section of his Catalogue. Most of these series 
are too short to be treated singly, and a grouping of them on a geographical basis 
in order to obtain large enough samples was hence necessitated. The pooling 
which was first carried out can be seen from the headings of the columns in the 
upper part of Table VI. The series Ia, Ib and Ic represent adjoining regions in 
the extreme north-east of the country. The crude coefficients of racial likeness 
between them for 14 characters (omitting the capacity) are: Ia and Ib, 0-70 + 0-25; 
[a and Ic, 0-48 + 0-25; Ib and Ic, 1-22 + 0-25. Only the last of these can be supposed 
significant, and as it still indicates very close resemblance, it was felt that the 
pooling of the three series was advisable. The Iroquois series is only distinguished 


from the other two by the fact that its mean nasal breadth is significantly greater. 








for it 








TABLE VI 


Mean measurements of male cranial series referring to 
Algonkin and related Indian tribes 


| 
Maine, New York, New Jersey Ohi 
CG : Huron, North-west of Manhattan (Delaware), “ae 
ee Massachusetts New York Island, Long Pennsylvani: Indiana, 
(tribes) po SSE r : CAR ig te. pod te ge Michigan, 
Connecticut, (Iroquois) Island, Staten Maryland, Illinois 
Rhode Island Island Virginia — 
Group la Id Ic Ila IIb 
C 1558-3 (3) 1524-4 (8) 1529-4 (8) 1490-6 (25) | 
L 188-0 (45) 188-6 (33) 190-5 (42) 185-4 (48) 183-3 (46) 
B 137-7 (45) 137-7 (33) 139-5 (42) 139-7 (48) 138-6 (45) 
H’ 137-9 (41) 138-9 (31) 140-4 (38) 141-5 (30) 141-6 (34) 
GH 75:2 (22) 74-8 (21) 73-8 (27) 72-8 (12) 74-8 (24) 
J 137-5 (26) 138-4 (23) 138-8 (28) 139-9 (12) 140-3 (19) 
NB 25-6 (31) 27-4 (26) 25-6 (32) 27-1 (17) 25-9 (35) 
NH 52-3 (31) 53-5 (27) 52-3 (32) 52-7 (18) 53-8 (33) 
0.* 34-4 (33) 33-9 (25) 33-6 (29) 33-9 (22) 34-9 (29) 
O,'* 39-3 (33) 39-0 (23) 39-4 (29) 38-7 (22) 39-6 (29) 
100 B/L 73-2 (45) 73-0 (33) 73-3 (42) 75-4 (48) 75-7 (45) 
100 H’/L (73-4 (41)i+ {73-6 (31)! {73-7 (38)} (76-3 (30)} (77-3 (34)} 
100 B/H’ {99-9 (41) {99-1 (31)} {99-4 (38)} {98-7 (30)} {97-9 (34)} 
100 NB/NH 49-5 (31) 51-5 (26) 49-1 (32) 51-3 (17) 48-5 (33) 
100 0,/0,* 87-5 (33) 86-8 (23) 85-4 (29) 87-7 (22) 88-1 (29) 
Western; 
Wisc., Lowa, 
oe Kentucky Miss : nen, North-Eastern East-Central 
(tribes) : (Cheyenne),} 
(Chippewa),§ 
(Piegan) 
Group il Iv4 la+1b+Te Ila + 11d 
( 1432-5 (24) 1514-0 (41) 1533-6 (11) 1500-0 (33) 
I 177-0 (34) 183-9 (49) 189-0 (120) 184-4 (94) 
B 135-8 (34) 142-4 (49) 138-3 (120) 139-2 (93) 
139-5 (27 135-0 (47) 39-0 (110) 141-6 (64) 
G’H 70-4 (25) 72-9 (41) 74:5 (70) 74-1 (36) 
J 136-0 (21) 141-7 (35) 138-2 (77) 140-1 (31) 
NB 23-8 (26) 26-2 (46) 26-1 (89) 26-3 (52) 
NH 50-9 (29) 53-6 (47) 52-7 (90) 53-4 (51) 
0.,* 32-6 (30) 35-2 (39) 34-0 (87) 34-5 (51) 
O,'* 38-1 (27) 39-6 (39) 39-3 (85) 39-2 (51) 
100 B/L 76-7 (34) 77-5 (49) 73-2 (120) 75:5 (93) 
100 H’/L 78-8 (27) 73-4 (47) 73-6 (110) 76:8 (64) 
100 B/H 97-3 (27) 105-5 (47) 99-5 (110) 98-3 (64) 
100 NB/NH 46-8 (26) 49-0 (46) 49-9 (89) 49-5 (50) 
100 0,/0,"* 85-0 (27) 89-0 (39) 86-6 (85) 87-9 (51) 


4 


The orbital measurements given for individual crania are the averages of the readings for the right and left sides. 
+ The mean indices in curled brackets were found from the means of the component lengths instead of from indices 
for individual crania. 


From Kansas, Wyoming, Colorado and Nebraska. § From North Dakota and Michigan. 
From Montana. - 


Omitting three deformed crania from Iowa. 
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The pooled means for the three series are given in the lower part of the table, 
and they will be said to relate to the Algonkin: North-Eastern States series, 
although the Iroquois do not belong to the Algonkin speaking peoples. 

The series Ila represents States immediately to the south and on the eastern 
seaboard, and Id relates to four larger States to the west. A series from Ken- 
tucky was excluded from the latter group because it obviously defines a different 
type. The area covered by [la and IIb together is much larger than that of the 


north-eastern States. The two series give a crude coefficient of racial likeness of 


1-07 + 0-25 for the 14 characters, and no single character shows a significant 
difference as the highest a found is 6-0. The coefficient differs significantly from 
zero, but it indicates a very close resemblance. Accordingly, the series Ila and 
IIb were combined and the pooled means (in Table VI) will be referred to as those 
of the Algonkin: East-Central States series. The Kentucky series was treated 
by itself, and the remaining Algonkin skulls for which measurements are given in 
Hrdliéka’s 1927 Catalogue were pooled to form the Western Algonkin series. 
These last were obtained in ten States covering an area greater than that of all 
the Algonkin States to the east of them put together. This pooled series is still a 
small sample, and it is obvious that far more abundant material would be required 
to delimit different regional types of population found among the Algonkin 
speaking peoples. The partitioning of them adopted here is a pis aller, and, again, 
it can only be considered to be of provisional value. 


Means for the four series finally adopted are given in the lower part of 


Table VI. The Kentucky series is almost too short to use for statistical purposes, 
but the other three are of fairly adequate lengths. The coefficients of racial 
likeness between them are given in Table VII. It is surprising to find that there is 
not a single one indicating a close resemblance. The lowest is for the adjoining 
groups representing the North-Eastern and East-Central States, but this indi- 
cates a greater divergence than that usually found between two neighbouring 
populations. The aberrance of the Kentucky series is particularly striking, and 
this is evidently due to the small size of its type. For all the absolute measurements 
in Table VI except H’ the Kentucky series has by far the smallest mean, though 
all its indices differ insignificantly from those for the series representing the East- 
Central States. Our conclusions accord with Hrdlitka’s so far as the statement 
that the Iroquois “are radically of the same physical type with the Algonkins and 
cannot be separated from the latter’’, but his contention that “‘the extensive 
Algonkin strains shows almost throughout a ciear and distinct physical cha- 
racter”’ is not confirmed. 

Table VIII gives the reduced coefficients of racial likeness between the four 
Algonkin series, on the one hand, and the six Californian, on the other. All the 
values are high except one, and it must be remembered that little importance 
should be attached to even marked differences between high coefficients. The 
exception is for the comparison between the Algonkin series from the Western 
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TABLE VII 
Coefficients of racial likeness between male cranial series 
referring to Algonkin and related Indian tribes* 


North-East | East-Central sae ee Western 
States States espe States 
n Crude coefficients 
North-East States 91-5 8-89+0-25 | 22-41+0-25 | 16-90+0-25 
East-Central States | 58-5 8-89 + 0-25 11-57 +0-25 | 11-93+0-25 
Kentucky 27-9 | 22-41+0-25 | 11-57+0-25 22-92 + 0-25 
Western States 44-1 | 16-90+0-25 | 11-93+0-25 | 22-92+0-25 
Reduced coefficients 
North-East States 12-46+0-34 | 52-41+0-58 | 28-40 + 0-41 
East-Central States 12-46 + 0-34 30-64+0-65 | 23-74+0-49 
Kentucky 52-41+0-58 | 30-64+0-65 67-04 +0-72 
Western States 28-40+0-41 | 23-74+0-49 | 67-04+0-72 


All the coefficients in this table are based on the 15 characters for which means are given in 
Table VI. The n’s for the series are the mean numbers of skulls on which these means are based. 


States and the Central Californian series. A much closer resemblance is indicated 
in this case than those between the former and the other Algonkin types. 


5. COMPARISONS OF OTHER UNITED STATES SERIES 

Mean measurements (given in Table [X) were calculated for the three following 
groups from data given for male skulls in the 1927 section of Hrdlitka’s Catalogue: 

(a) Sioux proper: Miscellaneous 17, Teton 4, Brulé 15, Oglala 14, Sisseton 4, 
Yankton 5 and Montana 4. The pooling of this material is necessitated by the 
fact that the series for single tribes are all too short to make their treatment singly 
profitable. It is said that they are all closely related in physical type, and the 
mean measurements for the short series confirm this as far as can be seen. They 
are all characterized by a low basio-bregmatic height. The region represented is 
made up by the six most westerly states of the large region from which the skulls 
making up the Western Algonkin series were obtained. 

(b) Arikara, a Siouan tribe said to be nearly related to the Sioux proper, 54. 
This is a long enough series to be treated by itself. The skulls composing it were 
all obtained in South Dakota and some of the Sioux proper and Western Algonkin 
specimens came from the same State. The other series representing Siouan tribes, 
and the following ones relating to Caddoan, Salish and Sahaptin tribes, are all 
too short to be used. 
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TABLE VIII 


Reduced coefficients of racial likeness for Algonkin and Californian male cranial series* 


Californian Series 


San Francisco 

: . Central 3ay and vicinity 
Northern (Yokuts) "(including 
Costanoan) 


n 49-] 42-2 87-1 





North-East States 68-30 + 0-44 (11) 56-03 + 0-48 (11) 21-29 + 0-28 (15) 
East-Central States 51-03 + 0-52 (11) 36-01 + 0-57 (11) 21-22 + 0-35 (15) 
Kentucky 40-95 + 0-80 (11) 49-39 + 0-85 (11) 29-78 + 0-58 (15) 
Western States 25-04+0-61 (11) 12-93 + 0-66 (11) 99-55 + 0-42 (15) 





Californian Series 


. . Santa Catalina 
santa Cruz 





Santa Barbara 1 Sant San Clements 

: and Santa . ’ 

County : and San Nicolas 
. Rosa Islands 

Islands 

43-7 169-5 36-2 
£ North-East States 91-5 52-80 + 0-42 (15) 78-03 + 0-21 (15 55-40+0-51 (13 
7 . =a = = Bo " an rs ° 
“4 oO East-Central States 58-5 52-64 + 0-49 (15) 81-76 + 0-28 (15) 100-36 + 0-59 (13 
S = Kentucky 27-9 37-40 + 0-72 (15 71-88 + 0-51 (15) 152-98 + 0-84 (13 
gs Western States 44-] 31-66 + 0-56 (15 38-26 + 0-35 (15 59-35 + 0-66 (13 


* The characters on which the coefficients in this table are based can be seen from a comparison of the 
neans available for the different series which are given in Tables I and VI, all the characters in the latter 
table being used when possible. The 7v’s given are for all 15 characters in the case of the four Algonkin and 
three of the Californian series, for 11 characters in the case of two others and for 13 characters in the cass 
one other Californian series. In the comparison of the Algonkin with these last three series the Algonkin 7’s 
are slightly different from the values given for all 15 characters. 

(c) Shoshonean: Bannock 1, miscellaneous unidentified tribes 8 (omitting 5 
deformed), Utes and Gosh-Utes 9, and Paiutes and Pah-Vants 6. The tribes of 
this group are said to form a fairly uniform type. (It should be noted that the 
female Blackfoot specimen and the Piegan series are included in the Shoshonean 
section of the Catalogue in error. The latter forms part of our Western Algonkin 
series.) The Shoshonean skulls were obtained in Colorado, from which a few of 
the Sioux proper and Western Algonkin skulls were obtained, and also from four 


States to the west unrepresented by any other material dealt with above. 
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The majority of the skulls for which measurements are given in the 1931 
section of Hrdlitka’s Catalogue are artificially deformed, and the two following 
series are the only ones of undeformed specimens which can be taken from it. 

(d) Basket-maker, 33. These skulls of cave-dwellers in southern Utah are said 


to form a remarkably uniform collection which cannot be subdivided into types. . 


A few of the Shoshonean specimens came from Utah. 

(e) Old Zui, 35. These skulls were collected in Havikuh village in New 
Mexico. According to Hrdliéka’s classification, they and the Basket-makers 
represent the “dolichoid group” of the Pueblo peoples. There are no sufficiently 
long series of undeformed crania in his Catalogue representing the “brachy- 
cranic’’ Pueblo group. This last is represented by the following series recorded 
by Professor Hooton. 

(f) Pecos Pueblo. The total is divided into four groups representing different 
archaeological strata, and the whole period represented is probably rather less 
than one thousand years. The majority of the specimens in each subseries are 
artificially deformed. Means, standard deviations and coefficients of variation 
are provided for the “deformed” and “‘undeformed”’ crania in each archaeo- 
logical group, but the numbers for the latter kind are so small that a series long 
enough for statistical purposes can only be obtained by pooling them. For most 
of the facial characters the only constants provided are for the total series, as it 
was assumed that these had not been modified by the calvarial deformation. 
The means given in our Table [X are thus based on a short series of forty-six 
skulls in the case of the calvarial measurements and on a longer one of 126 skulls 
(including the forty-six) in the case of most of the facial characters. This is un- 
fortunate, as there is a possibility that the undeformed people did not represent 
a random sample from the total population, and comparison of their mean facial 
measurements with those of the deformed series would have been of interest. As 
individual measurements are not provided it is not possible to investigate this 
question. It is shown in section 8 below that the Pecos Pueblo sample is peculiarly 
heterogeneous when compared with all the others considered, and hence it is not 
suitable for comparative purposes. Unusual variability is actually exhibited by 
its calvarial rather than by its facial measurements. 

(g) Florida. In his 1922 publication Hrdlitka gives individual measurements 
of a considerable number of crania of Florida Indians, from mounds and shell- 
heaps, divided into seyeral short series. A few of these are artificially deformed, 
and measurements suspected to have have been affected are enclosed in brackets. 
All measurements not ¢listinguished in this way were included by the author, and 
by us, in computing means, although those of a few slightly deformed specimens 
are thus used. He gives pooled means for all the skulls except those of Seminoles, 
for this group omitting the Seminoles divided into two sub-groups—one being 


composed of all the specimens with cephalic index above 80 and the other of 


those with the index below 80—and for the Seminoles separately. The division 
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on the basis of the cephalic index is a purely arbitrary procedure. The small 
Seminole series of eleven male skulls was apparently kept separate not because 
these specimens are clearly distinguished from the others on account of their 
appearance or measurements, but because the Seminoles are believed to have had 
a rather different origin from the other natives of Florida. The mean measure- 
ments do not lend support to this view. In order to obtain a larger series, the 
total material was divided into two, one being made up by all the crania from the 
west coast and the other by the remainder which includes the Seminole specimens. 
The male means found for these two groups are: 


L B H’ 100 B/L 100 H’/L 

West coast 179-9 (78) 145-3 (78) 141-4 (55) 80-8 (78) 79-1 (55) 

Others 180-0 (43) 143-4 (43) 141-3 (32) 79-7 (43) 78-2 (32) 
GH Rs NH NB 100 NB/NH 

West coast 74-7 (44) 140-9 (40) 52-9 (47) 25-0 (47) 47-5 (47) 

Others 72-7 (21) 139-4 (25) 52-1 (28) 25-3 (26) 48-8 (26) 


By supposing that the standard deviations of the series are of the usual order, 
all the differences between these two sets of means are found to be insignificant. 
Accordingly they were combined and the pooled means, given in Table IX, were 
used for comparative purposes. It is shown below that the variability of this 
pooled series is quite unexceptional. We do not mean to assert that the Indian 
population of Florida represented was perfectly homogeneous from a racial 
point of view, but only that from the evidence available there appears to be no 
justification for partitioning it. More abundant material might make it possible 
to distinguish subsections of the population which could be differentiated. The 
absolute measurements for which means are given above are the only ones pro-_ 
vided for the Florida skulls, with the exception of the total facial height from 
nasion to menton, and the series is metrically described less adequately than all 
the others dealt with in this paper. 

The reduced coefficients of racial likeness for all possible pairs of these seven 
series from Western and Southern States, and between them and the Californian 
and Algonkin series, are given in Table X, and they are discussed in the following 
section. The Pecos Pueblo series is included here although it is considered to be 
unsuitable for comparative purposes on account of its exceptionally great 
variability. 
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6. THE RELATIONSHIPS OF THE NorTH AMERICAN INDIAN SERIES 
JUDGED FROM THE COEFFICIENTS OF RACIAL LIKENESS 
AND A COMPARISON OF SINGLE CHARACTERS 

All the reduced coefficients of racial likeness having values less than 19 
found between the sixteen series—omitting the Pecos Pueblo—-are indicated in 
Fig. 1. This also indicates approximately the localities from which the series 
were obtained. As has been pointed out in describing the material, some of these 
areas overlap to a considerable extent in the case of the Shoshonean, Sioux, 
Arikara and Western Algonkin collections. There are two of the series showing 
no connections of the order considered, viz. the Californian from Santa Catalina 
and two neighbouring islands and the Kentucky Algonkin. The former is ob- 
viously of a specialised type, as its calvarial length and cephalic and height- 
length indices are close to the extremes for ail races in the world. It is not unusual 
to find that an island population is of a distinctive type. The Kentucky Algonkin 
series is chiefly distinguished on account of the small size of nearly all its means 
of absolute measurements, and it is to be expected that some close connections 
with it would be found if more material were available for neighbouring popu- 
lations. 

In a general way, the closest resemblances are found between neighbouring 
peoples, as has been found in the comparison of other similar groups, but there 
are several exceptions to this. The most striking is found in the case of the four 
Algonkin series, which are remarkably dissimilar in type, and it must be concluded 
that the linguistic grouping has little ethnic significance. The close resemblance of 
the Florida and Central Californian types is also unexpected. It is shown in 
section 7 below, from comparisons with Asiatic material, that it is safer to 
neglect some of the higher reduced coefficients shown in Fig. 1. If no account is 
taken of any greater than 13, then the Shoshonean series also becomes isolated 
and the Basket-maker and Old Zufi are detached from the Californian. 

What we can assert at present is that there are marked divergences between 
various Indian tribes of the United States. How many races should be recognized 
cannot, we feel, be stated with precision. It may not be amiss to point out, 
however, that Professor von Eickstedt’s classification (1934, pp. 678-88) is 
fairly well in accord with our findings. His “‘margide Gruppe” is represented 
by the Californian and Florida series. It is particularly interesting to find a bond 
between the two. His “‘sylvide Gruppe”, however, has to be broken up into at 
least two: the prairie Indians, represented by the Sioux and the Arikara, and the 
Indians of the north-eastern forests, the Eastern Algonkin. The Kentucky group, 
which is found to be isolated, indicates that there may be further races of which 
we have insufficient knowledge at present. The “‘zentralide Gruppe” may be 
represented by the Old Zuni and Basket-makers, but these are dolicho- or 
mesaticephalic (73-0 and 75-9, respectively), certainly not brachycephalic, as 
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von Eickstedt describes his group. It may be pointed out in passing that these 
two types resemble quite closely that of the Peruvian skulls described by 
MacCurdy (1923). 

In making comparisons between the mean measurements for a number of 
cranial series, the relative extents to which different characters differentiate them 
“an be estimated conveniently by comparing the percentages of significant 
differences found. The question whether a particular difference is significant or 
not can be judged from the « obtained for it in computing the coefficient of racial 
likeness. An a is approximately the square of a quantity which is the difference 
of two means divided by its standard error, and it may be arbitrarily supposed 
to indicate differentiation if it is greater than 10. The percentages of «’s found 
greater than 10 are given below for all the comparisons between the sixteen 


American Indian series—omitting the Pecos Pueblo—and for all the comparisons 
between 12 Oriental series :* 


100 100 100 
AL B/H’ B/L H’ B L NH 0, NB 
16 American 74-2 68-3 65-8 64-2 60-8 54-2 51-7 48-7 48°3 
series 
12 Oriental 35-9 40-6 57-6 35-9 28-8 51-5 49-] 21-8 48-5 
series 
us 0, 2 100 100 
p ( G 1 ’ ,) " . 
/ H (or O,) A 0./0, a NB/NH 
16 American 47°5 44-9 39-2 37-9 33-3 27°: 14-3 10-0 0 
series 
12 Oriental 13-6 0 48-5 34-0 14-3 22-6 24-2 55-4 10-7 
series 


These two sets of frequencies arrange the characters in rather dissimilar orders. 
For the American series the highest percentages are shown by the three major 
calvarial indices and the three diameters from which these are obtained: for the 
Oriental series these characters also give percentages among the highest, but thev 
are equalled or exceeded by those for the three nasal measurements and the 
upper facial height. For all of the 18 characters except N B,G’H,100N B/NH, LB 
and AZ the American percentage is greater than the corresponding Oriental 
value, and in several cases markedly greater. There is, in fact, marked diversity 
among the Indian types of the United States compared with that normally 
found for comparable groups in other parts of the world. 


* These last percentages have been given by Woo and Morant (1932, pp. 130-1). 
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The arrangements provided by the means for single characters, or pairs of 
characters, are far less suggestive than that given by the coefficients of racial 
likeness, and detailed consideration of them would not be profitable. 


7. COMPARISONS OF NortTH AMERICAN INDIAN WITH 
ASIATIC AND ESKIMO CRANIAL SERIES 


In preceding sections of this paper the coefficients found between all possible 
pairs of sixteen male series of crania representing Indian populations of the 
United States have been given. Interpretation of these generalised measures of 
the resemblances of the types will obviously be aided if the results of comparisons 
of the same kind between the Indian and other groups of series are also known. 
Such intergroup comparisons were made with Asiatic and Eskimo material. 

The coefficients have been given for all pairs of 26 male Asiatic series (Woo & 
Morant, 1932) and for all pairs of seven Eskimo series (Morant, 1937). In the 
paper on the latter comparisons were also made between them and the Asiatic 
series, though actually only one coefficient of this kind is given. Computation in 
full of the remaining 181 (182=26 x7) was considered unnecessary because a 
test applied showed that all these reduced coefficients were extremely likely to 
be greater than 19, and no account was taken of values greater than this in 
obtaining the classification of the Asiatic series. The test in question depends on 
the fact that for these groups the calvarial length, breadth and height and the 
three indices derived from tk ese measurements gave percentages of significant 
differences (indicated by «’s greater than 10) larger than, or almost as large as, 
the percentages given by any other of the 31 characters used. The values of the 
coefficients were evidently determined largely by these six measurements, and 
it has been shown in section 6 above that the same is true for the North American 
Indian series. For the two groups of series the maximum. differences between the 
means found in the case of comparisons which give reduced coefficients of racial 
likeness less than 19 are: 


L B H’ 100 B/L | 100 H’/L | 100 B/H’ 
26 Asiatic series 6-7 6-1 6:3 4 3-4 6-5 
16 North American 7-4 7-2 6-5 3-6 3-5 50 


Indian series 


Considering the Asiatic series alone, if any one of the 26 available could be com- 
pared with a new Asiatic series, and if one or more of the differences of the means 
for the six characters were found to exceed the limit given above, then it is 
unlikely that the reduced coefficient found would be less than 19. In the 
same circumstances, it is still less likely that one of the Asiatic and a non-Asiatic 


8-2 
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series would give a reduced coefficient less than 19. These considerations make it 
possible to select, by merely finding the differences of a few means, those pairs 
of series in new comparisons which will almost certainly provide reduced coeffi- 
cients greater than the limit (19) arbitrarily chosen. The ranges of the differences 
actually used for this purpose were those above for the Asiatic series with the 
addition of 0-1 to each, viz. L 6-8 mm., B 6-2 nuim., H’ 6-4 mm., 100B/L 5-5, 
100H'/Z 3-5 and 1006/H’ 6-6. After the pairs of series which will probably give 
coefficients which will indicate greater dissimilarity than any to be used in the 
classification have heen selected in this way, we are left with a number of pairs 
which may or may not give reduced values less than 19. It is not necessary to 
calculate all these in full, since it can often be seen from the «’s for a few characters 
only that a value greater than 19 will be obtained, so that there is no need to 
complete the computation. 

The twenty-six Asiatic give 416 comparisons in pairs with the sixteen North 
American Indian series. The test described shows that 318 of these coefficients 
are almost certainly greater than 19. Of the remaining ninety-eight, seventy-six 
were also found to be greater than the limit and it was not necessary to calculate 
them in full in order to be sure of this. The twenty-two reduced coefficients less 
than 19 are given in Table XI. It should be noted that connexions of the order 
considered are only found between seven Oriental types and the Chukchi, on the 
one hand, and twelve of the sixteen American types, on the other. Most of the 
southern Oriental series, all the Northern Mongolian (Siberian) and all the 
Indian series are excluded. The fact that the closest resemblances are between 
eastern and north-eastern Asiatic and the American populations is in accordance 
with expectation, but a moment’s consideration shows that little significance 
can be attached to the measures of resemblance which lead to this conclusion. 
The American series are thereby connected with the Oriental in what appears 
to be a haphazard way. For example, the Kentucky Algonkin series is linked 
to the Japanese (reduced coefficient = 17-0), while the lowest coefficient found 
between it and the fifteen other American series is 27-7: the North-Eastern and 
EKast-Central Algonkin series were found to be connected only with one another 
when comparisons were confined to American material, but the former shows 
a connection with the Aino and the latter with the Japanese, Aino and Chinese 
Prehistoric series. Results such as these can only be considered so unreasonable 
that the assumption that the method used is capable of presenting the situation 
in such a way that it will be possible to unravel the skein of interrelationships 
seems to be discredited. 

There is the possibility, however, that the defect is due not to the method in 
itself but to the way in which it is being used. The fact that different high orders 
of reduced coefficients are not capable of indicating different degrees of distant 
relationship can easily be demonstrated. Hence it was concluded that only values 
below a certain limit should be considered. The limit chosen in the case of 
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TABLE XI 


Reduced coefficients of racial likeness less than 19 between 
North American Indian and Asiatic series of male skulls 


Northern California 
Santa Barbara, California 
San Francisco Bay 
Central California 

Florida 


Northern California 

San Francisco Bay 
Central California 
Kentucky, Algonkin 
North-Eastern Algonkin 
Western Algonkin 
Shoshonean 

Arikara 

East-Central Algonkin 


Old Zuni 


Middle 


Dayak Java 
48-2 64-4 


18-52 + 0-53 (14) | 16-83 +0-46 (14) 


Japanese 


Aino 


17-47 + 0-39 (12) 
12-18 + 0-24 (16) | 10-82+0-28 (16) 
17-00 + 0-54 (15) 
16-43 +0-31 (13) 
15-77 + 0-46 (13) 


13-26+0-31 (15) 15-96+0-38 (13) 


Tibetan A 


35-9 


16-75 + 0-62 (14) 
16-24 + 0-62 (15) 


Chukchi 


34-0 


11-86 + 0-73 (12) 


6-83 + 0-71 (12) 
18-50 + 1-01 (12) 
2-25 + 0-68 (12) 
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Fukien 
Chinese 


36-0 


14-51 +0-45 (18) 
15-67 +. 0-66 (14) 


| 
15-16 + 0-62 (14) 
Is 2098 (1) | 


Chinese 
Prehistoric 


39-1 


17-73 + 0-57 ( 
18-09 + 0-76 ( 


Where more than one coefficient is given for a particular series, the » for it in this table is the mean number 


of skulls available for the coefficient of racial likeness characters in the case 


number of characters. 


of the coefficient based on the largest 


comparisons of Asiatic series with one another was 19, because the arrangement 


provided by all the values less than 19 appeared to be a reasonable and suggestive 


one for them. The same may be considered true, as far as can be seen, for the 


North American Indian series considered by themselves (see Fig. 1), but this is 


not so when the cross connections between the Asiatic and American series are 


considered. But it is still possible that the limit chosen is really too high and that 


more reasonable results would be obtained if it were lowered. 


Before considering this question the results of comparisons between the 


American Indian and Eskimo series may be given. There are sixteen in the 


former and seven in the latter group and a comparison of the six calvarial measure- 


ments suggested that seventy-eight of the 112 comparisons would give reduced 
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coefficients of racial likeness greater than 19. It was found that thirty-one of the 
remaining thirty-four comparisons also give values above the same limit, leaving 
the following three reduced coefficients: Western Eskimo (220-0) and Arikara 
(49-1)—7-07 + 0-31 (15); Western Eskimo (220-0) and Western Algonkin (44-1)— 
15-91 + 0-33 (15); Point Hope Eskimo (125-1) and East-Central Algonkin (58-5)— 
17-32 + 0-31 (15). 

The data in Table XI show that any attempt to take into account all reduced 
coefficients less than 19 is likely to be unprofitable when considering the classi- 
fication of the three groups of races considered. All the connexions between the 
series which remain when the limit is reduced to 13 are shown in Fig. 2, and this 
new limit has again been chosen arbitrarily merely because it appears to lead to 
the most suggestive arrangement. The position as far as the United States Indian 
series considered alone are concerned is little changed, except that the Shoshonean 
series has become isolated, and that the Basket-maker and Old Zuii lose their 
connexions with the Californian series. The arrangement of the Eastern Asiatic 
series is less changed and the continuous system which they form remains intact. 
There are only two connexions between the two groups, viz. those linking the 
San Francisco Bay series to the Aino and Japanese. There are closer relations 
between the American Indian types and those of the Western Eskimo and 
Chukchi populations, and these are somewhat unexpected in view of the fact that 
no data from Canada are available. 

The evidence suggests forcibly that in attempting to estimate relationships 
by these methods it will be safest to ignore all reduced coefficients of racial likeness 
greater than 13, as inconsistent results are likely to be obtained if significance 
is attached to differences indicating more distant degrees of resemblance. This 
restriction actually makes it necessary to discard certain suggestions which 
appeared to be of considerable interest. For example, if significance is attached 
to any reduced coefficient less than 19 then the Chukchi is found to have only one 
connexion with the Asiatic series* (viz. with the Chinese Prehistoric), and only one 
with the Eskimo (viz. with the Western Eskimo). A link is thus found between 
the two groups precisely where it would have been expected. But there can be 
no justification for accepting this result and at the same time refusing to interpret 
the evidence of the majority of the coefficients in Table XI in the same way. 

It should be decided, then, that the most suggestive classification is likely to 
be reached if no account is taken of any reduced coefficients greater than 13, but 
this limit may still be too high. If it is reduced to 10, say, the arrangement shown 
in Fig. 2 is broken up, as it were, into a few constellations of series having no 
connexions with one another, and in the case of the American Indian material of 
a number of isolated series as well. It may be anticipated that these last would 
become linked to one another to form a constellation if more series from the area 
were available, but no demonstration of this can be given at present. If the view 


* Excluding the short Tibetan B series which gives a reduced coefficient of 14-5 with the Chukchi. 
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that a reliable classification can only be based on the evidence of the close resem- 
blances of types be correct, then it is clear that a considerable number of series 
representing a particular group of races must be available before it becomes 
possible either to estimate their interrelationships, or to determine the links 
between them and other groups. In the present case it is safest to conclude that 
the cranial material available makes it possible to distinguish a few groups of 
closely allied peoples among those of Eastern Asia and North Ame:vica, but that the 
connexions between the groups, and the affinities of a number of types which do not 
fall within them, must remain undecided until new material clarifies the position. 

The links found between a Californian type and the Aino and Japanese are 
suggestive, but little importance can be attached to them at present. The results 
of the cranial comparisons appear to be in favour of the hypothesis which postu- 
lates an immigration into the American continent via the Straits of Bering. The 
Chukchi may then be considered as a tribe left in Asia during this migration, and 
the resemblance between the Arikara and the Western Eskimos may be taken as 
an indication of a former contact between the two races. The links between the 
Californian and the Oriental types may be an indication of the same route of 
migration rather than a sign of direct trans-Pacific traffic. Japanese and American 
Indians are too far removed as regards the colour of their skin and other integu- 
mentary characters to make this link appear plausible. It must be remembered 
that neither the Japanese nor the Californian series of skulls used in these com- 
parisons are adequately described by the measurements given for them, and 
better material might lead to rather different conclusions. 

It may be noted that there are no characters for which means are available 
which distinguish all the Asiatic from all the North American Indian types, 
though many of the most significant differences between them are due to the 
fact that the latter tend to have the broader and higher facial skeletons. The 
length, breadth and height of the brain-box and the three indices derived from 
these measurements are remarkably similar for some of the pairs of series, but 
in the cross comparisons only one case was found—viz. the Northern Californian 
compared with the Fukien Chinese series—for which all the differences of the 
means for the six calvarial characters are insignificant. The same is found for the 
Western Eskimo, on the one hand, compared with the Arikara and Western 
Algonkin, but all the Eskimo types have decidedly lower nasal indices than all 
the North American Indian. 


8. THE VARIABILITIES OF SERIES OF NorRTH AMERICAN INDIAN SKULLS 


In the foregoing sections of this paper comparisons are made between the 
means of 17 male series of crania which were finally selected for the purpose, and 
all these relate to Indian populations of the United States. The means for these 
are given in Tables I, VI and IX. The Kentucky (Table VI) and Shoshonean 
(Table LX) series were considered to be too short to give estimates of any value 
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of the variabilities of the populations they represent. The standard deviations for 
the remaining 15 series are provided in Table XII, omitting a few values which 
can only be given for fewer than thirty specimens. 

In comparing these constants two groups of the series will first be considered 
separately. The first consists of six from California. Taking each character in 
turn, the differences between all possible pairs of the standard deviations were 
estimated in terms of their probable errors, and these ratios will now be supposed 
to indicate significant deviations if they are greater than 3-5. For 9 of the 14 
characters concerned the constants for the Californian series show no differences 
of this order, for Z there are found to be 2 significant differences, for NH 2, for 
B 4, for G’H 4 and for 100 B/L 5. Of the total seventeen ratios greater than 3-5, 
the largest is 5-5 and there are only four greater than 5-0. It should be remembered 
that in a set of ratios of the kind considered some values greater than the limit 
chosen must be expected owing to chance. In comparing different pairs of the 
Californian series, the numbers of characters which can be used range from 8 to 11. 
There are four pairs of series showing no significant differences, six showing 1 
only, four showing 2 only and one showing 3 significant differences only. It is 
clear that there is no evidence to show that the six Californian populations repre- 
sented differed substantially in variability, and, in view of the danger that the 
small samples available may not have been drawn entirely at random, it appears 
safest to conclude that these populations all exhibit the same degree of variation. 

The second group of series referred to is made up by the following eight in 
Table XII, viz. all the “‘“Other U.S.A. series” except the Pecos Pueblo. A similar 
treatment leads to precisely the same conclusion in this case. In a total of 325 
comparisons, the difference exceeds 3-5 times its probable error in thirty-two 
instances, there are three ratios greater than 5 and the largest is 7-2. Comparing 
the series in pairs—the numbers of characters which can be used range from 9 
to 13—there are found to be eight pairs showing no significant differences, 
eleven showing | only, six showing 2 only and three showing 3 significant differ- 
ences only. A close approach to equality in variation is again indicated. 

The Pecos Pueblo series was not included in the second group because its 
standard deviations are obviously peculiar. For 7 of the 13 characters they are 
greater than all the corresponding values for the other fourteen series. Little 
importance can be attached to the fact that the Pecos Pueblo standard deviation 
is extreme in the case of the capacity (C) and orbital index (100 O,/0}), but this 
is not so for the remaining 5 characters for which the situation (for fourteen 
comparisons) is: 

L, Pecos Pueblo o significantly greater than 12 others and highest ratio 6-6; 

B, Pecos Pueblo o significantly greater than 3 others and highest ratio 5-5; 

H', Pecos Pueblo o significantly greater than 4 others and highest ratio 5-2; 

100 B/L, Pecos Pueblo o significantly greater than 6 others and highest 
ratio 6-1; 
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O; (9 comparisons), Pecos Pueblo o significantly greater than 6 others and 
highest ratio 6-1. 

The Pecos Pueblo series is evidently appreciably more variable than any of 
the others, and as it differs from them in this respect it must be supposed unsuit- 
able for purposes of racial comparison. Its peculiarity may be due either to the 
fact that the measurements selected because they were believed to have been 
unaffected by artificial deformation were not uninfluenced by this disturbing 
factor, or to the fact that the population represented was racially more hetero- 
geneous than all the others. 

Comparisons were not made between the variabilities of pairs of the series 
of which the first belongs to the Californian and the second to the other group of 
series distinguished, as it is of more interest to compare the two groups in another 
way. The average standard deviations for the six Californian series given in the 
fourth column from the end of Table XII were obtained by weighting the squares 
of the constants for the single series with the numbers of skulls on which they are 
based. The following column gives averages computed in the same way for the 
eight other Indian series, excluding the Pecos Pueblo. It is clear that these two 
sets of average values show a much closer appreach to equality than is shown by 
the standard deviations for almost all pairs of the component series. Probable 
errors for the average constants have not been computed, but it is probable that 
most of the differences between the two sets for corresponding characters are 
quite insignificant. For ten characters the Californian values are in excess and 
for four in defect of the others, but the absolute differences between the constants 
are all small, and these relations can only be taken to indicate that the Californian 
populations show a slight tendency to be more variable than other Indian 
populations of the United States. 

The penultimate column of the table gives the average standard deviations, 
computed in the way described, for ali 14 of the Indian series, still excluding the 
Pecos Pueblo. These may be compared with the values in the last column for 
Egyptian skulls obtained from a single cemetery at Gizeh used from the 26th to 
30th dynasties.* Probable errors for these last are given, and, in view of the total 
numbers of skulls on which they are based, it will be safe to assume that for corre- 
sponding characters the American averages will have probable errors either of 
the same order or rather greater than the Egyptian. On this assumption, there 
seems to be no reason to suspect that the differences between the two sets of 
standard deviations are clearly significant except in the case of the capacity and 
three orbital measurements (for which the Egyptian values are the greater) and 
the bizygomatic breadth (J) and cephalic and nasal indices (for which the 
American values are the greater). But even in these cases the absolute differences 
between the corresponding constants are small, and the use of the Egyptian values 
in computing coefficients of racial likeness between American Indian series seems 
to be sufficiently justified. 


* The Egyptian standard deviations are taken from Pearson & Davin (1924). 
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9. CONCLUSIONS 

This paper presents a preliminary classification of the Indian races of the 
United States derived from the mean measurements of groups of undeformed male 
adult crania. The data provided by Gifford and Hrdlitka were found to be the 
only ones suitable for the purpose. The total 1167 skulls were divided into sixteen 
series—three being made up by fewer than forty specimens each—for which 
means are given. Judging from the standard deviations (Table XIT) the sixteen 
selected series indicate a remarkably close approach to equality in intra-racial 
variability, and only one other (the Pecos Pueblo) had to be rejected because it 
appears to represent a decidedly more heterogeneous population. The average 
standard deviations for the sixteen series are found to be remarkably close to those 
of a long series of late dynastic Egyptian crania, and this order of variability 
is rather less than that found for modern series of crania from Western Europe. 

Comparisons between the types of the series were made by applying the 
method of the coefficient of racial likeness, the classification suggested being 
derived solely from the lowest orders of reduced coefficients. When possible these 
constants are based on thirty-one cranial characters, but for the American 
material they can only be computed for numbers between 11 and 18, since several 
of the customary measurements are not available. This limitation is unfortunate, 
but there is no reason to believe that the orders of the reduced coefficients obtained 
are different from those which would be given if ail thirty-one characters could 
be used. 

All the values less than 19 found are indicated in Fig. 1. Comparisons were 
also made between the sixteen North American Indian series, on the one hand, 
and Oriental and Eskimo series on the other, and all the reduced coefficients less 
than 13 within and between these three groups are shown in Fig. 2. There are 
several other connexions between the United States and Oriental series provided 
by values between 13 and 19, but it is suggested that no significance should be 
attached to these, and hence that no account should be taken of reduced co- 
efficients greater than 13 in classifying the American series. Owing to the com- 
plexity of the problem, it was to be anticipated that the way in which a generalized 
criterion of resemblance, such as the coefficient of racial likeness, can best be 
used to furnish a classification of racial types must be determined empirically. 
The contention that the most suggestive results are obtained by considering the 
evidence of close resemblances only is fully sustained by the present investigation, 
but the limiting order of resemblance which can best be used may have to be 
modified again in the light of more abundant material. If the evidence of all 
reduced coefficients less than 13 is taken into account, then the only connexion 
between the North American and Oriental types are the links between a Cali- 
fornian series and the Aino and Japanese. If it should be found necessary to 
reduce the limit again—to 10, say—then for the existing material there will be 
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no connexions between the two groups, though it is probable that some would 
be provided by populations unrepresented at present. The fact that the Chukchi 
is closely allied to some North American types but not to any of the available 
Asiatic types is unequivocal, and there are close bonds between the Western 
Eskimo and the United States types. A surprising diversity is found among the 
Indian populations of the country, and this is equally apparent whether the 
coefficients of racial likeness are considered, or the mean measurements are com- 
pared in any more direct way. On this account, it will be necessary to have con- 
siderably more material than that available at present in order to reveal their 
interrelationships in a completely satisfactory way. Comparison of the results 
obtained already with those which might be derived from more adequate metrical 
descriptions of the same material is also required. 


APPENDIX 
NEW SERIES OF AMERICAN INDIAN CRANIA 

Shortly after the paper above had been written, measurements were published 
of new series of American Indian crania excavated from mounds in Fulton County, 
Illinois (Cole & Deuel, 1937). The report on them is said to be an interim one only, 
but the measurements provided are more detailed than those for nearly all the 
United States Indian series described previously. The artefacts found with the 
skeletons made it possible to distinguish six cultural divisions extending from 
some pre-Columbian date to the seventeenth or eighteenth century, though no 
objects suggesting contact with Europeans were discovered. 

Our means computed from the individual measurements of male undeformed 
skulls are given in Table XIII for the following groups: 

(i) Mounds 14 and 34 (table facing p. 264)—late in date. It is said that the 
skulls from these two mounds are “very closely related, permitting the pooling 
of the craniometric data’’. 

(ii) Mounds 85 and 86 (table facing p. 264)—late in date and following or 
contemporaneous with (i). It is said that these skulls do not differ markedly from 
those in the first group. 

(iii) All other mounds, viz. 7, 10, 11, 12, 13, 14, 15, 77 and 188 (tables in text) 

earlier in date. A number of types are distinguished among these skulls but the 
total is very small. 

Mean measurements for these three groups are given in Table XIII. In the 
case of the majority of the characters considered there, it is clear that all the 
differences are quite insignificant, though even if this were so for all of them it 
would not provide good evidence of identity of type owing to the small sizes of 
the series. Differences which are probably significant are only found for the basio- 
bregmatic height (H’) and the two indices involving this chord. But little stress 
can be laid on this fact, as so few individuals are represented that the samples are 
particularly unlikely to be truly random ones representing large populations or 
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TABLE XIII 


Mean measurements of series of male crania from Fulton County, 
Illinois, and standard deviations for the total series* 


L B H’ LB B’ J GH 
Mounds 14 and 34 180-1 (27) 140-0 (27) 145-6 (24) 105-5 (24) 94-6 (27) 140-4 (22) 75-0 (26) 
Mounds 85 and 86 182-5 (13) 137-3 (13) 140-7 (12) 105-3 (12) 92-8 (13) 136-5 (13) 74-0 (13) 
Other mounds 183-2 (18) | 140-1 (17) 138-5 (8) 102-3 (8) 94-8 (18) 140-0 (12) 73-0 (12) 
Total series 181-6 (58) 139-4 (57) 143-0 (44) 104-8 (44) 94-3 (58) 139-2 (47) 74-5 (51) 
o’s for total series | 6-77+0-42 | 4-404+0-28 | 5-2440-38 | 4-98+0-36 5-86+0-41 | 3-57+0-24 


NH NB O,L OL G,’ G, 100 B/L 


Mounds 14 and 34 53- 


3-5 (27) 27-0 (26) 43-5 (23) 34-4 (26) 47-9 (26) 40-4 (24) 77°8 (27) 
Mounds 85 and 86 53-2 (13) 26-0 (13) 41-8 (13) 34-8 (13) 47-9 (13) 40-1 (12) 75-4 (13) 
Other mounds 53-3 (12) 26-0 (12) 43-0 (11) 34-9 (12) 47-4 (8) 39-7 (8) 76-8 (17) 
Total series 53:4 (52) 26-5 (51) 42-9 (47) 34-6 (51) 47-8 (47) 40-2 (44) 76-9 (57) 
o’s for total series | 3-10+0-2] 1-87 +0-12 1-92 +0-13 1-95+0-13 3°83 + 0-24 

100 A’/L 100 B/H’ 100 NB/NH 100 0,/0,, L 100 G,/Gy’ Prosth. PZ 
Mounds 14 and 34 81-2 (24) {96-2 (24)! 50-4 (26) 79-5 (23) 84:5 (24) 83°:2 (27) 
Mounds 85 and 86 77:3 (12) {97-6 (12)} 49-1 (13) 83-2 (13) 84-8 (12) 84°-5 (13) 
Other mounds 76:5 (8) {101-2 (8)! 49-0 (12) 80-4 (13) 83-7 (7) 84°-0 (12) 
Total series 79-3 (44) {97-5 (44)} 49-7 (51) 80-7 (49) 84-5 (43) 83°:7 (52) 
o’s for total series 4-55 + 0-30 4-39 + 0-30 


* The measurements of the Fulton skulls were determined by using Martin’s definitions. The symbols for then 
used here and listed in the footnote to ». 95 above may be taken to indicate exact correspondence with the definitions 
followed in determining the measurements of the other series used in this paper. Those in this table not available for 
any of the other series are: B’ = minimum frontal breadth (Martin’s No. 9) O,L=breadth of orbit from maxillo- 
frontale (51), G,’=length of palate from staphylion to orale (62), G,= breadth of palate between the mid-points of the 
inner alveolar walls of the second molars (63) and Prosth. P 


=angle between chord joining nasion to prosthion and 
the Frankfort horizontal plane (72). 


a single large population. Owing to the limitations of the evidence, it appeared 
best to pool the three series in the hope that it might represent a single racial 
group, and accordingly the means and standard deviations for the total series 
given in the table were computed. Comparisons of its variabilities, however, show 
that this sample cannot be supposed racially homogeneous. These can be made 
with the average standard deviations for 14 cranial series given in Table XII in 
the case of 13 characters, supposing that the variabilities of the orbital breadth 
and indices determined in the different ways are comparable, since they give 
almost identical standard deviations when found for the same series. For 1! of 


the 13 characters the Fulton standard deviations are in excess of the average 
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values and two of the differences in these cases are probably significant. The other 
two, for which the position is reversed, are quite insignificant. The total series 
must hence be supposed racially heterogeneous. Standard deviations were also 
computed for the two later series (mounds 14, 34, 85 and 86) alone, but heters- 
geneity is still indicated, as 10 of these values out of the 13 exceed the average 
values for the 14 series. 


In spite of the unsatisfactory nature of the total Fulton series, it was thought 
worth while computing a few coefficients of racial likeness between it and some 
other series of American Indian crania used in this paper. A comparison of a few 
means showed at orice that nearly all of the 16 would give reduced values greater 
than 19 and this was confirmed in four doubtful cases leaving only one below 
the limit, viz. Fulton (x= 50-6) with Algonkin East-Central (61-9), reduced 
C.R.L.=4:42+ 0-49 for 12 characters. The fact that this close resemblance is 
found between two series from the same region is obviously suggestive, but the 
inadequacy of the new data must not be forgotten. It is to be hoped that the 
data for additional skulls from Fulton County which are said to be available 


will make it possible to determine the relationships of the populations represented 
in a more satisfactory way. 
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A Description of Nine Human Skulls from Iran excavated by 
Sir Aurel Stein, K.C.1.E. 


By G. M. MORANT 


THERE are few parts of the world which are less known from a craniological point of view 
than Iran. The total number of skulls from the country preserved in European collections 
appears to be less than fifty and no measurements for a series of any length have been 
published.* The specimens described below were obtained by Sir Aurel Stein during two 
of his archaeological expeditions, and I am indebted to him for granting me permission 
to examine them. The material is not extensive enough to justify any statistical comparisons 
of the measurements, and the object of this note is to place on record particulars of the 
provenance of the skulls and a description of their characters. 

Nos. 1 and 2 were excavated by Sir Aurel Stein on his Third Persian Expedition (1933-4) 
and the remaining seven on his Fourth Persian Expedition (1936-7). A published account 
of the discovery of the first skull is quoted below and particulars of the others are from his 
unpublished records. The condition of the specimens can be seen from the photographs 
(Plates I-III). Dehbid is in the province of Fars and it may be said to belong to Central 
Iran, Bampur is the south-east of the country (Persian Baluchistan) and Dinkha and 
Hasanlu are in the extreme north-west, near Lake Urumiyeh. The sites are thus widely 
separated except the last two which are about 50 miles apart. 


(1) Skull of an infant found in an artificial mound at Dehbid. 

*‘On excavation the mound yielded throughout abundant painted potsherds, worked 
stones and associated objects from the chalcolithic period of occupation. These were found 
at depths 1 to 4 ft. below the surface level. In section iii at a depth of 1 ft. were discovered 
the remains of a partial burial, comprising the neatly trepanned skull of a woman or child, 
a lower jaw and a small quantity of bone fragments lying close to it. A small carved stone 
pendant representing a clenched hand subsequently turned up in the same section and 
depth, but a little farther off. This resembles so closely a number of similar pendants found 
in one of the Sasanian burial cairns of Bishezard that a strong probability suggests itself 
of the partial burial near which it was found being intrusive, i.e. having been placed within 
the chalcolithic debris layer at the foot of the mound in historical times.’ (Stein, 1936.) 

The excellent and fresh condition of the cranial bones renders it extremely probable 
that the burial was intrusive, and that it belongs not only to historical but also to very 
recent times. The child probably died in the third year of life. The milk dentition was 
completely erupted and the crypts for the first and second permanent molars were open in 
both jaws, with the crowns of these teeth formed but not erupted as far as the alveolar 
margins. The basi-occipital and right exoccipital bones are missing, while the suture 
between the left exoccipital and the supra-occipital is open except for a length of |] cm. 
where it is synostosed. The hole in the left parietal (see Plate III B) was almost certainly 
made after death by a blow from a pointed weapon or tool. Its edge shows no sign of 
separation and the rondelle of bone forced out is still attached to the endocranial surface. 


(2) Skull of an adult male from Bampur. 
‘“'The skull marked ‘ Bampur B + 5 feet’ was found in a grave on the top of a prehistoric 
mound near Bampur fort in Persian Baluchistan. It is in all probability mediaeval and 


* The longest published series appears to be that of eleven skulls for which measurements and 
descriptions were provided by the late Dr Viktor Lebzelter (1931). 
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may have been that of some Baluch belonging to the same tribe as now forms the population 
of that territory.” 

This well-preserved skull has clear male characteristics. The coronal suture is just 
beginning to close. The teeth are considerably worn, eight had been lost before death and 
three in situ are reduced to stumps owing to caries. The upper left canine was formed but 
unerupted, its tip being on a level with the alveolar margin. 


(3) Cranium of an infant from Dinkha. 

“The skull from Dinkha was found in a tomb excavated hy the eroded side of a large 
mound occupied in chalcolithic times. The site lies in the large valley of Ushnu, between the 
south-west shore of Lake Urumiyeh and the main Zagros range forming the boundary 
between Persian and Iraq Kurdistan.” 

The specimen consists of a calvaria with the base partly defective and the greater part 
of the right side of the upper facial skeleton. The child probably died in the third year of 
life. Judging from the right side of the upper jaw, the milk dentition was completely 
erupted and the crypts for the first permanent molars were open, with the crowns of these 
teeth formed but not erupted as far as the alveolar margin. The basi-occipital and right 
exoccipital bones are missing, and the suture between the left exoccipital and the supra- 
occipital is half obliterated. 


(4) Calvaria of a child from Hasanlu: Hasan. A. 

“The six skulls from Hasanlu came from burial of a late chalcolithic period found in an 
extensive ancient graveyard adjacent to a very large mound near Hasanlu village, some 
6 miles to the south of the southern shore of Lake Urumiyeh. The burials comprised com- 
plete bodies, all parts except the skulls being much injured. ‘The dead had been buried at 
depths varying from about 8 to 12 feet. The furniture, mainly pottery, was fairly uni- 
form.” 

The bones of specimen A from this cemetery are remarkably fresh. The basal suture is 
completely open and an age at death between 5 and 10 years is suggested by the form and 
size of the calvaria. 


(5) Skull of an adult female from Hasanlu: Hasan. B. 

The coronal suture is beginning to close while the sagittal and lambdoid are open. The 
greater part of the vault was affected by a pathological conclition, the ectocranial surface 
being rugose and in places exceptionally thin, especially at the obelion. Within the area 
affected the sutures (including the whole of the sagittal suture) are far simpler than usual 
(see Plate ITI D). The vault is asymmetrical, the right side being higher than the left (see 
Plate II B). There is fronto-temporal articulation on both sides. The two upper central 
incisors were the only teeth lost before death and there is a large abscess cavity at the site 
of the right tooth (see Plate II B). The upper left canine is small and peg-shaped and one 
premolar is reduced to a stump owing to caries. The teeth are considerably worn. 


(6) Skull of an adult from Hasanlu: Hasan. C. 

In spite of its large size, this specimen is probably female, the superciliary ridges and 
transverse occipital lines being feebly developed. The calvarial sutures are completely 
open. The right central incisor was the only tooth lost from the upper jaw before death. 
A premolar and a molar had been lost from the lower jaw and no third molars had erupted. 
The teeth in both jaws are considerably worn and three had been reduced to stumps 
owing to caries. There is a large abscess cavity at the socket for the root of the upper right 
lateral incisor (see Plate II D). The mandible is exceptionally small and feeble for the 
cranium. 


(7) Skull of an adult male from Hasanlu: Hasan. D. 

This is a well-developed and muscular specimen. The calvarial sutures are completely 
open. No teeth had been lost before death and the upper left third molar is absent. One 
upper molar is markedly eroded by caries and the teeth are moderately worn. 
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(8) Cranium of an adult female from Hasanlu: Hasan. F. 

The lambdoid suture is closing, the sagittal is beginning to close and the coronal is 
open. No teeth had been lost from the upper jaw before death. The teeth are considerably 
worn. The right side of the palate has a rugose surface and two cavities due to disease. 


(9) Calotte of an adult female from Hasanlu: Hasan. G. 

The coronal and sagittal sutures are beginning to close on the external surface and 
nearly obliterated on the internal; the lambdoid is beginning to close on the external and 
half obliterated on the internal surface. 


It may be noted that all of the five adult specimens with one or both jaws extant ex- 
hibit some form of dental disease, while the age at death for the oldest of these people was 
probably under 35 years. This would not have been anticipated as the teeth and palates 
of late prehistoric skulls are usually found to be better preserved than those of modern 
man. Judging from a qualitative comparison, and making allowances for age and sex, 
eight of the total nine specimens do not show greater differences than those which might 
well be found in a sample of such a size selected from a racially homogeneous population. 
The remaining skull is the modern one from Bampur in Persian Baluchistan and it appears 
to be distinguished from the others chiefly by the form of its facial skeleton, though it also 
has the highest cephalic index. 

Measurements are provided in Tables I and II, the usual biometric symbols denoting 
these being given and also the numbers in Rudolf Martin’s list. There is nothing par- 
ticularly remarkable in these data, but it may be noted that if the specimens are considered 
as a single series the type is decidedly orthognathous. The photographs reproduced in 
Plates I-III were all taken as nearly as possible with the focal plane of the camera parallel 
or perpendicular to the Frankfort horizontal plane. 


TABLE J] 


Calvarial measurements of Iranian skulls 





Hasan. | Dinhka} Dehbid | Hasan. | Hasan. | Hasan. | Hasan. | Hasan. 
A B C a G D 

Juv. Juv. Juv. ? ; 
Glabella-occipital max. length (1: M.1) 168-5 175 192-5 174 180 189 
Max. parietal breadth (B: M.8) 30 132-5 133 126 131 140 
Min. frontal breadth (B’: M.9) 92-7 93-0 94-1] 86-8 - 96-1 
Max. frontal breadth (B’’: M.10) 113-5 112-5?! 112-5 103 113 116-5 
Biasterionic breadth (M.12) 95-5 105 103 103 106-5 109 
Basio-bregmatic height (/7’: M.17) 123 128 133 126-5 139 
Chord nasion to bregma (S,’: M.29) 109-0 93-8 108-7 119-6 100-0 109-4 113-0 
Chord bregma to lambda (S,’: M.30) 110-6 95-8 103-7 112-27] 116-2 115-9 120-2 120-1 
Chord lambda to opisthion (S,’: M.31) 92-8 83:1 91-3 91-17) 102-3 93-5 94-0 
Are nasion to bregma (S,: M.26) 124 106 124 138 112 28 125-5 
Are bregma to lambda (S,: M.27) 27°5 108-5 118-5 123? 130 133 133-5 131 
Arc lambda to opisthion (S,: M.28 106-5 103°5 109-5 112? 27 110 115? 
Are nasion to opisthion (S: M.25) 358 333 359 395 355 372 
Horizontal circumference (U: M.23a) 478 27 446 500 532 485 : 
Transverse arc through bregma ( BQ’: M.24) 298 262 291 295 312 281 319 
Length of foramen magnum (fml: M.7) 32-0 36-0? 36-1 37-0 40-4 
Breadth of foramen magnum (fmb:; M.16) 27°5 28-0? 5-9? 27-0 
Chord nasion to basion (1B: M.5) 3 91-8 100-0 94-8 110-2 
100 B/L 77-2 80-0 78-7 75:7 69-1 72-4 72:8 74:1 
100 H’/L 73-0 73-1 69-1 72:7 73°5 
100 B/H’ 105-7 - 103-5 100-0 99-6 100-7 
( decipital index (Pearson’s) 65-3 57°3 60-1 58-1? 58-5 62-1 58-4 
100 fmb/fml 85-9 - 77°8? 71-7? 73-0 _— - 
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TABLE II 


Facial measurements of Iranian skulls 


Dehbid | Hasan. | Hasan. | Hasan. | Hasan.! Bam- 





B Cc F ) pul 
Juv. . ? 3 3 
Bizygomatic breadth (J: M.45) 96 120-5 115 128 128-5 
Mid-facial breadth (GB: M.46) 70-6 98-7 93-9 87:8 98-2 96-1 
Upper facial height (@’H: M.48) 44-2 64-92 | 69-0? 59-4 80-9 65-0? 
Chord basion to alveolar point (GZ) 84-0? 86-9? 89-0 106-1 96-4? 
Nasal height (NH, L) 31-2 52-1 53-5 42-6 60-8 46-2 
Nasal breadth (NB: M.54) 19-5 24-1? 25-2 23-0 24-1 26-0 
Orbital breadth LZ (O,L: M.51) 33-4 40-9? 38-6 37-7 45-0 43-0 
Orbital height Z (O,.L: M.52) 27-2 33-6 31-5 31-7 35-4 32-7 
Palatal length (G,’: M.62) 33-4 44-0 40-5 51-6 45-0 
Palatal breadth (G,: M.63) 43-9 43-0 39-3 44-4 
Simotic chord (SC: M.57) 8-8? 9-4 9-0 11-8 11-3 
Subtense to simotic chord (SS) 6-7 2-9 7-4 4-6 
100 G’H/GB 62-6 65-8? 73-5? 67-7 82-4 67-6 
100 NB/NH, L 62-5 46-3? 47-1] 54-0 39-6 6-3 
100 0,/0,, L 81-4 82-2? 81-6 84-1 78-7 76-0 
100 G,/G,’ 97-7 97-0 86-0 
100 SS/S( - 71:3 32:2 62-7 40-7 
N 61°-97 | 58°-4? | 65°-9 65°-2 7 
A 75°-12 | 79°-02 | 76°8 70°-9 
B 43°-0? | 42°-6? | 37°:3 43°-9 
Alveolar profile angle (PZ) 88°-5 92°-5? | 83°-5 86 81 
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THE PROBABILITY INTEGRAL TRANSFORMATION FOR 
TESTING GOODNESS OF FIT AND COMBINING INDEPEN- 
DENT TESTS OF SIGNIFICANCE 


By E. 8. PEARSON 


1. INTRODUCTORY 
IF p(x) is the elementary probability law of a continuous random variable x 
in the interval a <2 <b, so that p(x) = 0 for x<aor >b and 


eh 


; MORNREE Suey, 08 Fee Pee ee eee (1) 


« 


ay 


then we may write y= | eee. ah (2) 
Ja 

y is a non-decreasing function of 2, having values confined to the interval (0, 1). 

Further 


In other words the probability law for the integral, y, is rectangular, all 
values of y between Oand | being equally likely to occur. It follows that if we wish 
to use a set of m independent observations 2,, x5, ..., x, to test the hypothesis H, 
that a probability law is of specified form, say p(x | H,), it may be possible to carry 
out this by testing the equivalent hypothesis, hy, that the corresponding values 
Y1» Yo: «++; Y,, Obtained by means of the transformation (2), have been randomly 
drawn from the rectangular distribution (3). The relation between x; and y; 
is illustrated in Fig. 1; corresponding to the abscissae 2;, (i= 1, 2, ..., 10), of the 
ten ordinates drawn above, are ten values of y shown below on the scale 0 to 1. 
The hypothesis H, that the ten z’s are a random sample from a population distribu- 
tion represented by the frequency curve is therefore equivalent to the hypothesis 
h, that the ten y’s form a random sample from a rectangular distribution, range 
0 to 1. 

If the probability laws p(a) are not the same for all the a’s, so that 
¥, = aimee (6m 4)3.. 8h ees (4) 

7 A 
the n values of y; will still be distributed independently as in (3). It follows that 
the transformation is applicable not only to problems generally classed under 
the heading of tests of goodness of fit, where p,(x) is the same for all i, but also in 
another important type of problem where x; are a number of independent test 
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criteria, e.g. a number of values of ““Student’s” ¢ or Fisher’s z associated with 
differing degrees of freedom, and it is wished to obtain a single test of a com- 
prehensive hypothesis. Thus for example we may either: 

(a) Test whether it is likely that a sample of ten values of a variable x has been 
drawn from a Normal distribution with specified mean and standard deviation, 
E, and 0». 

(6) Test the hypothesis that there is no difference between the gain in weight 
of children fed on (i) raw, (ii) pasteurized milk, using ten values of ¢ obtained from 
a comparison in ten age groups of the mean difference in weight increase of children 
fed for six months on the two diets. 


ILLUSTRATION OF RELATION BETWEEN X AND +. 
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Fig. 1. 


Results following from this idea of using the probability integral transforma- 
tion, which seems likely to be one of the most fruitful conceptions introduced 
into statistical theory during the last few years, have been developed by R. A. 
Fisher (1932), Karl Pearson (1933, 1934) and J. Neyman (1937). It is my purpose 
in this article to review and link together some of the suggestions that have been 
put forward. 


2. CHOICE OF THE APPROPRIATE TEST CRITERION 


The probability that in a random sample of size n from the rectangular distri- 
bution (3), the y’s will fall within the elementary intervals y; + }dy; (i= 1, 2, ...,) 
is dy, dy, ... dy,,, i.e. is independent of the particular values of y. Thus any set of 
values of y is as likely to occur as another. What criterion are we therefore to 
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use in testing the hypothesis, 4», that the sample has been drawn from the 
rectangular population? Established custom in analogous problems might 
suggest that we should compare the moments of the sample with those of the 
rectangle. But which moments and how many? Fig. 2 shows six possible 
y-samples of size n = 10; of these sample (a) is likely to have moments agreeing 
most closely with those of the rectangle. Nevertheless each of the spot patterns 
illustrated is equally likely to occur in sampling if h, is true, and to assume that 
the test must be based on moments would appear to prejudice the issue. 
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Fig. 2. 


Following what may be described as the intuitional line of approach, K. 
Pearson (1933)* suggested as suitable test criteria one or other of the products 


i Sat ee Es) ogee) eee (5) 
or Q, = (l-—y,) (1—ye)... G—-y,).T 
Here Q, is the joint probability that in random sampling from p,(x) the n values 
of x will be as small or smaller than the corresponding observed values; Qj is 
the probability that they will be as great or greater than their observed values. 
In Fig. 2, sample (b) will give a relatively low value to Q,, and a relatively high 
value to Q}; for sample (c) the position is reversed. To form a complete statistical 
test it is clearly necessary to know how these Q criteria are distributed in random 
sampling if the hypothesis hy regarding the y’s, and therefore the hypothesis 
H, regarding the x’s, were true. 
* R. A. Fisher (1932) was primarily concerned with a combination of tests of significance, where 


the distinction between Q, and Qj did not arise in the same way. 
+ K. Pearson denoted these products by A,,. 
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By means of a simple transformation to new variables 
v, = —2log.y; (#=1,2,...,n), 
it is easy to show that —2log,Q is distributed as y® with degrees of freedom 
f = 2n, i.e. 


1 ; = 
POC) = Fp gue. anne (8) 


Exceptionally small values of Q, or Qj correspond to large values of x*. Thus 
a straightforward test is available which, on choice of the appropriate probability 
level from the x? tables, gives a precise control of the risk of rejecting the hypo- 
thesis tested regarding the p,(x) when it is true. 

In discussing the application of this test K. Pearson was aware of the difficulty 
of choice between Q, and Qj. From which tail of the distributions should the 
probability integral be calculated? He suggested that the smaller of the two 
should be used as giving the “more stringent test”. It may be noted that as an 


, 


alternative to Q, and Qj a third criterion may be used, namely 


n 
Co Faas eee (9) 
i=1 
where y.=2 pjx)dx = 2y; if x; is below median z, | 
Se, een I usage a 0.3 yk ON ee re (10) 
“b 
2 p,(x) dx = 2(1—y,) if x; is above median x.| 


It is seen that y' follows the rectangular distribution (3) if H, is true, and there- 
fore —2log,Q, is also distributed as y? with f= 2n. The criterion Q, will be 
exceptionally small if the x’s lie towards either tail of their probability distribu- 
tions,* e.g. in sample (d) of Fig. 2; it will be exceptionally large for sample (e). 

Provided that the test based on one of the products Q is being used to combine 
together a number of independent tests of significance, the intuition which lead 
to its choice appears on the whole to be sound, though it cannot be claimed that 
it is necessarily the best test. In such a problem the separate test criteria 2; 
(whether ¢, z, r, y?, etc.) have been chosen so that small values of y; or of 1—y; 
suggest that the individual hypotheses are improbable. Consequently a small 
value of Q is essentially associated with improbability of the combined result. 
Nor will it generally be difficult to decide on a priori grounds which of the three 
forms of Q is appropriate.+ In the case of tests of goodness of fit, however, when 
it is wished to test whether a sample 2,, 2», ..., #,, can have been randomly drawn 
from a population with probability law p(x) = p(x | Hy), there appear to be no 
a priori reasons for choosing the Q type of criterion based on the product of the 

* This form of the criterion appears first to have been defined precisely in print by P. V. 
Sukhatme (1935, p. 587). 


+ It is of course important not to make the decision as to which end of the x-distribution to 
start from in taking the integral depend on the observed values of the 2's. 
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probability integrals. When all the forms of pattern of the y’s, as shown in Fig. 2, 
are equally likely to occur if hy be true, how, it must be asked, are we to settle 
when the hypothesis should be rejected? It seems only possible to proceed further 
by specifying what other forms of probability law are to be regarded as possible 
alternatives to p(x | H)). 

Denote by p(x | H,) some alternative law. If now this is the true probability 
law, but the y’s have been calculated from equations (2) on the assumption that 
p(x) = p(x | Hy), then, as Neyman has pointed out, 


p(x| Hy) _ p(x| A,) 


r(2 h,) = = pe Gael: °° vccu (11) 
vy | Ms dy p(x | Ao) | chy 
dx 
where f(y) means the solution of 
y= [ Witt eae. st = ee (12) 


with regard to x. Thus the probability distribution of y, when H, is true, is obtained 
by calculating at points x = f(y) the ratio of the ordinates of the true and hypo- 
thetical probability functions. As an example, suppose that we are using ” values 
of x to test the hypothesis that the sampled population is represented by a normal 
curve with mean at zero and unit standard deviation. Then 


1 - 
rR ae at ia (13) 
Je 


Consider what would be the equation of p(y | h,) if the following had been the 
true forms of the population sampled: 


as a area Ore the Rene oi) has 4 
(I) p(x | H,) (2m) ‘ (14) 


a normal curve with mean at + 0-5 and unit standard deviation. 


I Sikye Saree, 15 
(it) p(x | H,) = 3 (2m) nin, Sere (15) 
3 ms 1 (=) 
«| H,) = oe Wey |. Cie 16 
(IIT) pw | Fh) = 5 Tam $ (16) 
normal curves with means at zero and standard deviations of $ and 3 respectively. 
4 2zr 
(IV) p(x | H,) = c(1+4an/f,) ae vA 
where (a) JP, = 0-4, oe) SO «er (17) 


Pearson Type III curves with mean at zero, unit standard deviation and 
f, = 0-16 or 0-49. 


Values for p(y|h,) were calculated from (11), corresponding to the points 
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y = 0, 0-05, 0-10, 0-20, ..., 0-80, 0-90, 0-95, 1-00;* the resulting curves are drawn 
in Fig. 3. They represent a number of different forms of departure from the 
rectangular y-distribution, corresponding in p(x) to: (I) a shift in mean; (I) 
and (IIT) changes in standard deviation; (IV) a change in shape. Clearly, in Fig. 2, 
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Alternatives to p(x H,) = Bi 
V/ (27) 


l , 
I. w(x H,)= e—Ka—)) 
V ) 


III. p(x|H,)= 


1 /2z\° 
IT. p(x H,)=.; —— ‘ 3(5) f 


y4 (27) 


IV. p(x]|H,)=c(1l+ harp)" ee VA, 


samples (c), (d), (e) and (f) are of patterns we might expect to find when testing 
H,, if the populations sampled differed from (13) in the directions of (14), (15), 


(16) and (17) respectively. 


The questions, therefore, that need consideration appear to be the following. 


* For the Type III curve, the tables of ordinates entered against a standardized abscissa, 
published by L. R. Salvosa (1930), were found very useful. 
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In testing, on observed sample values, whether p(x | Hj) represents the population 
probability law, 

(i) Can we define in what way the true probability law may diverge from that 
specified by H, (e.g. in location, scaling, shape, etc., one or all)? 

(ii) If this is possible, can we determine the most efficient test to apply to the 
y's in order to detect such divergence if it exists? 

(iii) If the definition required in (i) is impossible, how far can we determine 
what may be called a useful “‘omnibus”’ test, sensitive as far as possible to many 
forms of divergence? — 

It should be noted, and this point must be emphasized, that it is fundamental 
to any procedure that we may base on the distribution of y that the n transformed 
observations ¥;, Yo; -.-, ¥, are independent. If the function p(x | Hj) is obtained 
by fitting a frequency curve to a set of observed 2’s, this condition will not be 
satisfied by the resulting y’s. For example, had the curve been fitted by equating 
the first two moments of the theoretical distribution to those of the observations, 
types of pattern like those suggested in samples (b), (c), (d) and (e) of Fig. 2 would 
probably be ruled out, and the distribution of — 2 log, Q could no longer be that of 
x”. Whether some method of applying a test to the y’s can still be devised under 
these conditions has yet to be investigated. 

It must also be borne in mind that once we admit it to be necessary to take into 
account the form of the alternative hypotheses, a difference in character appears 
between the goodness of fit problem and that which is concerned with combining 
independent tests of significance. In the former case, if H, is not true, we suppose 
there exists some common alternative form p(x | H,) appropriate for all the 2’s, 
and hence a common p(y|h,). In the latter case, while the different p,(x | Hy) 
will lead on transformation to a common p(y | hy) = 1, the alternatives p,(x | H,) 
will not necessarily lead to a common p(y | /,) for all the test criteria. In the two 
following sections it is primarily the first type of problem that will be considered; 
the conclusions reached will, however, throw some light on the position obtaining 
in the second case. 


3. A PARTIAL SOLUTION BASED ON THE PRODUCT CRITERIA, Q 


The curves corresponding to cases I, IT and ITT in Fig. 3 could all be graduated 
roughly by Pearson Type I curves of the form 


P'(m,+ mz, + 2) 


p(y) = ply |’) = I'(m, +1) I'(m,+ 1) 
1 } 2" 


Bind (hae | a (18) 


In the case when the hypothesis tested is true, i.e. h, = ha, the rectangular 
distribution results from setting m, = m, = 0. The curve 


p(y | hy) = (m+1)(1—-y)™, —l<m<0O, 
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while it has an ordinate of value (m+ 1)< 1 at y = 0, provides an approximation 
to the form of the curve in Case I. Again the curve 


T'(2m+2 
Ply | hy) = a re-..—  eeee (20) 


{I'(m + 1)}*° 
can be made to represent the y-distributions of Case II (m<0) and Case III 
(m>0). No Type I curve can represent the y-distributions in Case IV. 

Starting from (18), or its special forms (19) and (20) as representing the 
possible alternatives, it is of interest to see what criterion, for testing the hypo- 
thesis hy (that p(y) is rectangular), flows from the application of the likelihood 
method which J. Neyman and the present writer have made frequent use of in 
other problems. 

This method consists of the following procedure: 

(1) Given a sample of x independent observations y,, >, ..., y,, their joint 
elementary probability law if h, be true, is, 


ee a ED ee ee ee (21) 
while if any other member of the admissible set of alternatives is true, it is 


{ I(m,+m.+2) |" 
\P'(m,+ 1) P(mey+h)) 5 


i 


n 
AY, Yes ---> Fn | &) = i y@(l—y,)™. ...... (22) 
1 
2) Determine the values of m, and m, which make (22) a maximum, and call 
1 2 
the corresponding maximized function p(y,, Ys, ..., ¥, | # max). 
(3) Then the likelihood ratio criterion for testing h, will be A, where 
P(Y15 Yas «+> Yn | Ag) 


- iid se 23 
P(Y1; Yo: --+s Yy, | AMax) (°) 


Taking the form (19) to represent p(y |h,), we have only one parameter, m, 
é l J | 1 . 
to determine 


log p(Y1, Yo ---sY, | 2y) = nlog(m+1)+m log I] a- yi) whenied (24) 
i=1 
‘ 0 log n , 
Whence ae — + log Q;, 
om m+ 


where Q; is defined in (6) above. Equating this expression to zero, it is seen that 
a maximum solution is given by 


l in ef 
m+1 = oe log Q an ee (25) 
| away 2 
n x 
where ae = RO ee (26) 


provided that y?>2n. If x*<2n, since m <0, the maximum solution is given 
by m = 0. 
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Consequently we find that 


see ()" nayle 


2n 


= (2n)-"(x2)"e“b a, (27) 

provided that y? > 2n; if y?< 2n then A = 1.* 

Thus A 0 and the hypothesis tested becomes less and less likely when 
x? > o and Q; > 0. If the hypothesis is true, then we know from the discussion 
on p. 137 above that —2log Q; is distributed in the standard y? form with 2n 
degrees of freedom. 

But not only is the test based on Q; that derived from the A-ériterion; it may 
be easily shown that it is the uniformly most powerful test} of.the hypothesis h, 
with regard to the set of alternatives defined by (19). In other words if the 
admissible alternatives to H, lead to forms p(y |h,) following the J-curve (19), 
then the test based on Qj or, if the integrals are more appropriately calculated 
from the lower terminal, on Q, has the following unique property: it is impossible 
to find any other test which gives a larger chance of detecting departure of the probability 
law from the specified form p(x | Hy). A fresh light seems therefore to be thrown on 
the product criteria Q, and Q}. While the form (19) will not be exactly followed in 
practice, a little reflection on the matter suggests that it will represent the 
general chara: teristics of the departure of p(y) from the rectangle if the possible 
changes in p(x) correspond to a translation of the whole p(x) distribution to right 
(or left). It is of interest to note that apart from its application in goodness of fit 
tests, this is also the kind of change we may often expect when the x’s are the 
criteria used in a number of independent tests of significance. Thus, if some general 
hypothesis is not true, a number of independent values of “‘Student’s” ¢ may be 
distributed approximately about some common mean value other than zero; 
while the shape and standard deviations of these modified t-distributions will also 
be altered, the changes involved would relatively be much less than in the mean. 

If now we start from the form (20), which as has been pointed out will represent 
approximately the curves of Cases IT and II] in Fig. 3, we may proceed to calculate 
the A-criterion in a similar manner. The equation to solve for m, to obtain 
P(Y1; Yoo «++ Y» | Max), is 


Olog I(2m+2) 2elog I(m+1) 


om om 


l 
= —-log(Q,Q)), ene ee (28) 
n 


* Since the admissible alternatives have been restricted to those defined by (19), i.e. with 
~1<m<0, we cannot reject H, when high values of Q; or low values of x? are obtained from the 
data. Thus in such cases the value of the A-criterion is unity, suggesting no reason for rejecting Ho. 
If however we take —1<m< 00, equation (19) will now represent J-curves with maxima either at 
y=1 or y=0. We are then aiming at a test which is sensitive to translation of p(x | H,) both to 
right and to left of p(a | H,), and A-> 0 either when Qj > 0 or when Qj 1. 

+ See Neyman and Pearson (1933a, 6). The proof that the test possesses this property follows 
from the results given on pp. 298-302 of the earlier paper. 
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where GCG = 7. eit-e. | ee (29) 
i=1 


Thus it appears that A, if determined, would be a function of Q, Q}. Without 
attempting to go further into the problem it may be noted that a test criterion 
depending on Q, Q; is likely to be rather closely correlated with the criterion Q,, 
defined in equations (9) and (10). It will be seen that 


Q. = I (i-—2ig.-t, cites (30) 


i=1 
and the functions (a) 1—2| y;— 3 





, and (b) 4y,(1 —y;) both equal zero when y; = 0, 
increase monotonically to 1 wheny; = } and then decrease to 0 as y; increases to 1. 
In so far as this correspondence exists, it points to Q, being an appropriate 
criterion when the alternatives to p(x | Hp) are likely to have the same mean but 
either larger or smaller standard deviations. 
Using the more general Type I form of equation (18), it is found that A will 
be a function of both Q, and Qj, but not of Q, Q}. 
Finally it must again be noted that (18) cannot represent the curves shown as 
Case IV in Fig. 3, which arose when the probability distribution p(x | H,) had 
the same mean and standard deviation as p(x | H)) but was a skew rather than 
a normal curve. It is noted however that for both alternatives represented, 
i.e. Type III curves with £, = 0-16 and 0-49 respectively, the gradient of p(y | h,) 
increases approximately from y = 0 to 0-2, decreases from y = 0-2 to 0-8, and 
increases again from y = 0-8 to 1-0. Bearing in mind that a criterion of the type 
n 

@; = II (1-y,) appears to be efficient in detecting the existence of an increasing 
i=1 

gradient as in Case I, the following criterion is tentatively suggested as suitable 


to detect the presence of skewness: 


n 


Oe ty, ea (31) 
i=1 
where y, — 5(0-2 — y;) for O< y;< (0-2. 
y; = 3(y,—90-2) for 0-2<y;< 08, voce 
y,=5(l-y,) for 0-8<y,;<1. 


[t will be found that if y; follows the rectangular distribution, so also does y}. 
Thus — 2 log, Q, will again be distributed as y? with f = 2n, if H, is true. 

The difference in the character of the critical regions of the tests associated 
with Q,, Q, and Q, may be illustrated diagrammatically for the case n = 2, where 
for clearness a 20% significance level (rather than, say, 0-05 or 0-01) has been 
taken. In each case the hypothesis H, (or hy) would be rejected if the sample 
point (¥;, Y2) falls within the shaded regions; if H, be true the sample point is 
equally likely to fall anywhere within the unit square, so that the area of the 
shaded portions must be 20 °% of the whole. The boundaries of these regions were 
obtained from the y?-transformation. Thus for f = 4, the upper and lower 20 % 
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levels for y? are 1-649 and 5-989 respectively, giving corresponding levels for 
Q of 0-0501 and 0-4385. To determine the boundaries it is then necessary to find 
the co-ordinates of y, and y, satisfying, (i) equation (5) for Q,; (ii) equations (9) 
and (10) for Q,; (iii) equations (31) and (32) for Q,. A sample such as (b) of Fig. 2 
will give a y-point in the n-dimensioned cube which is likely to fall into the critical 

1-0 1-0 
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region of the type shown in Fig. 4 (i); a sample as (d) is likely to give a point falling 
into a region of the outer ring type of Fig. 4 (ii), while a sample like (e) will give 
a point falling in the central lozenge-shaped type of region of the same diagram. 
On the other hand samples like (f), which seem likely to arise when p(x | H,) 
has the same mean and standard deviation but greater positive skewness than 
p(x | Hy), will tend to give points in the more complicated region of the type of 
Fig. 4 (iii), ie. points with y values between 0-1 and 0-4 or above 0-9. 
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The suggestion regarding Q, is only put forward tentatively. But it appears 
that in so far as we know the kind of departure from p(x | Hy) to be expected, and 
therefore know the points within the n-dimensioned y-hypercube round which 
the sample points are likely to cluster, it should be possible to construct appro- 
priate tests of the Q-type on the lines suggested in the case of Q,. The sampling 
distribution of such criteria will be always exactly known if H, is true,* through 
the transformation —2log,Q = y?, while their efficiency in detecting that H) 
is false can be secured on a basis which, if crude, has a definite guiding principle 
behind it. For a more precise handling of the problem Dr Neyman’s work on 
‘smooth tests’? must be considered. 


4. Dr J. NEYMAN’S METHOD OF CHOOSING APPROPRIATE TEST CRITERIA 


Neyman (1937) deals with the goodness of fit type of problem, that is to say, 
he supposes that if p(y) is not a rectangle, then some single alternative p(y | h,) 
is appropriate for all observations. The system of curves which he has taken to 
represent the possible alternatives is 

x O,7,(y) : 
p(y | h,) = p(y | O4, Oo, ..., O;,) = ce Se lee (33) 

These curves depend on k parameters 9, which are at our choice; if all the 
O,'s are zero, p(y |h,) = p(y | ho). cis a function of the O,'s. Further 7, 79, ..., 7%, 
are a system of polynomials in y, orthogonal and standardized in the interval (0, 1) 
of which the first few are as follows: 


m,(y) = V12(y— 4), 

71(y) = V5{6(y — $)?— $}, 

7(y) = V7{20(y — $)® — 3(y— $)}, 
m,4(y) = 210(y— $)*— 45(y— $)7 +28. 


This form for p(y|h,) was chosen by Neyman partly for simplicity in the 
development of the appropriate tests and partly on the grounds that any function 
having the characteristics of log p(y) can be represented by a series of such 
orthogonal polynomials 7,(y). How many and which terms of such a series are 
needed to represent curves of such varied form as those shown in Fig. 3 has still 
to be explored. It will be noted that using only 7,(y), (33) gives an exponential 
which will correspond roughly to Case I, Fig. 3. Again 7,(y) will lead to a curve 
that will approximate to Cases IT and III, according as 0, is positive or negative, 
while 7,(y) will introduce a point of inflexion of the kind shown for Case IV. 
Nevertheless it will be seen that the form (33) may need a considerable number of 
terms before it will make p(y | h,) approach the values of 0 or oo at y = 0 and 1. 

* In this property the tests are more exact than Neyman’s tests discussed in the next section, 
since the sampling distribution of his criteria are only approximate for small values of n. 
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In some cases therefore the Pearson Type I curve of (18) may be more suitable 
than (33). It must be remembered, however, that the curves drawn in Fig. 3 
are somewhat exceptional, since the differences between the p(x|H)) of (13) 
on the one hand and the alternatives (14)—(17) on the other are relatively large. 
In any practical case where n is not too small, one would hope to be able to detect 
much smaller differences, i.e. to be dealing with alternative distributions p(y | /,) 
differing less drastically from the rectangular p(y | ho) = 1. 

Starting from the basis of equation (33), and assuming that n is not too small, 
Neyman has developed a series of tests, relatively simple to apply, which he calls 
‘‘smooth tests” that have the following properties. 

(a) The particular test which is most appropriate will depend upon the 
number of polynomials needed in (33) to represent the type of departure from 
the rectangular form likely to be met with in p(y|h,). This is a point at which 
practical experience must be introduced. Let it be supposed that in a given 
problem the first k polynomials are regarded as adequate. 

(6) The test is so adjusted that when H, (or ho) is true, ie. when 
O, = O,... O), = 0, the significance level may be fixed at any desired magnitude, 
e.g. at 0-05 or 0-01. 

(c) If H, be not true, the test is unbiassed in the sense of Neyman and Pearson 
(1936, 1938), and is more likely than any other unbiassed test to detect departures 
from zero in the k parameters @,, i.e. to detect that in the place of p(x | Hy) some 
alternative form of law p(x | H,) holds good. 

(d) The chance of detection, or the power of the test in Neyman and Pearson's 
terminology, in the neighbourhood of 0, = 0,... 0), = 0 is approximately a func- 
tion of 


(ec) For alternatives to p(a|H,) which lead to a function p(y |/,) needing 
for its representation more than the first k-polynomials, the test will not be 
sensitive. This means that for an “omnibus” test capable of detecting all manner 
of departures from the rectangle, we may require to introduce a considerable 
number of polynomial terms. Such a test will however be less efficient in detecting 
those forms of departure which one or two polynomial terms would be adequate 
to represent. 
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then Neyman’s criterion for the kth order test is 


which is approximately distributed as yx? with k degrees of freedom. The approxi- 
mation is due to the fact that while, if hy is true, the u’s have each an expectation 
of zero, a unit standard deviation and are uncorrelated (i.e. the correlation coef- 
ficient between any two of them is zero), they are not independent nor exactly 
normally distributed. As the sample size, x, increases the accuracy will rapidly 
improve. 

It may be shown that when n is large, and the constant c in (33) assumes its 
limiting form, exp[ — 4207], then Neyman’s test criterion (38) is exactly that 
which follows from applying to formula (33) the likelihood method of approach 
used in the preceding section. In fact it is found that 

k 
—inX(u’) 
A=¢ ee ee Pe eee (39) 
an expression decreasing from 1 to 0 as the yy} of (38) increases from 0 to oo. 

[t will be noticed that Neyman’s criterion is a sum of polynomial terms in 
the y;’s, or more simply, using (36), in the z,;’s. The product criteria Q, and Q, 
of equations (5) and (9) may also be expressed in this form. Thus 


n 
ating ee eee 
log Q, = — DB {log y;} 


These series do not of course bear any immediate relation to Dr Neyman’s 
polynomial expansions (37). 


5. SUMMARY 

This paper has drawn attention to the somewhat novel character of the 
problem to be faced in dealing with tests based on the probability integral trans- 
formation. The intuitional notions that have often served to determine the most 
appropriate test when dealing with normal variation are hardly applicable 
when we are concerned with a variable following the rectangular distribution. 
The tests proposed by R. A. Fisher and K. Pearson have been discussed, and 
emphasis has been laid on the need for consideration of the possible alternatives 
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to the hypothesis tested. The situation will differ according to whether the 
problem is one of testing guodness of fit or of combining the results of a number of 
independent tests of significance. Some illustration of these ideas has been given 
in the case where the hypothesis regarding the form of a probability law p(x) 
is incorrect (a) in the position of the mean, (b) in the magnitude of the standard 
deviation, (c) in the shape of the probability curve. A method has been suggested 
of adopting the product criteria, Q, to meet these different cases. 

Finally, a summary has been given of J. Neyman’s suggestions for dealing 
with the problem. From the theoretical point of view these suggestions appear to 
be fundamental in character; it is hoped however that it will be possible before 
long to carry out further numerical investigations (a) to determine how large 
the number of variables, x, must be to make his results accurate for practical 
purposes; (b) to throw more light on the relation between his polynomial form 
for p(y | h,), the tests based on Q,, Qo, Qs, ..., discussed in preceding sections and 
the classes of alternatives met with in different types of statistical problem. 
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Note by J. Neyman. I am grateful to the author of the present paper for giving me th: 
opportunity of expressing my regret for having overlooked the two papers by Karl Pearson 
quoted above. When writing the paper on the “Smooth test for goodness of fit’? and 
discussing previous work in this direction, I quoted only the results of H. Cramér and 
R. v. Mises, omitting mention of the papers by K. Pearson. The omission is the more to 
be regretted since my paper was dedicated to the memory of Karl Pearson. 














ON TESTS FOR HOMOGENEITY 
By B. L. WELCH, Pu.D. 


1. INTRODUCTION 
THE present paper is concerned with the familiar Z2* and y* tests for homogeneity. 
We are given a set of k samples and ask whether they can reasonably be regarded 
as having all been drawn from one homogeneous population. Denote the samples 


byt = 1,2, ...,k, let there be n, observations in the ‘th sample, and denote these 


k em 
by t= 1,2,...,”,. Then using S tomean > 5 we have 
t=1 i=1 
x S(x, —x ) 
E _ ss é ot “er RT ( 1) 
S(X;,; t 3 


A significantly large value of H? is taken to denote heterogeneity, levels of signi- 
ficance being deduced from the Beta-function distribution 


E®) = 
PA”) = BA-D, WN) 


| (ey 3(] a E2)XN k Dr): eek (2) 


This is the distribution which E? follows if the k populations sampled are identical 
and if, in addition, they are normal. In the application of the test in practice we 
are assuming that departures from normality are not such that (2) is much 
altered. 

It has been argued that the above method of using H* may not always be 
appropriate, depending as it does on the interpretation of the observations as 
random samples from an infinite hypothetical population. It may sometimes be 
better to consider the observations as samples from a limited population which is 

k 
conceived as follows. There are in the aggregate of the k samples N = > n. 

(=1 
observations. These may be divided up into & groups, with n, in the ith group, in 
N!/(n,! ng! ...,!) ways. The particular way in which the N observations are 
grouped in the samples which we are given, may be regarded as randomly 
selected from all the possible ways of grouping these same N observations. We 
may calculate a corresponding distribution of values of H? (discrete of course) to 
which the observed ZH? may be referred, instead of (2). This point of view would 
seem to be particularly appropriate in experimental work where some process of 
randomization has actually been carried out. For instance there may be k experi- 
mental treatments and N experimental objects on which to try them out. If the 
treatments are assigned to the objects at random, with the sole proviso that each 

* HE? is used instead of 9? to denote the squared correlation ratio in a sample. This is in accord 


with the accepted practice of retaining Greek letters for population parameters. y*, however, is 
too well established to be replaced by an italic letter. 
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treatment shall be repeated a certain number of times, then the connection with 
the idea of sampling a limited universe is direct. 

However, even when the observations are to be interpreted as a sample from 
an infinite population, it may still be instructive as a first step to consider the 


limited population which can be generated by shuffling the aggregate of N and 


redividing into groups of n, (t=1,2,..., 4) in all possible ways. For instance, if 


we require the moments of H? in samples from the infinite population, it may be 
convenient to calculate them first for the limited population, and then proceed 
by considering all possible limited populations. This is the procedure of the 
present paper. The 2-group situation has recently been discussed by E. J. G. 
Pitman (1937). What follows links up with this work and also has points of con- 
tact with an older paper on similar topics by R. C. Geary (1927). I shall also refer 
to the recent discussions of the y? test for homogeneity when expectations are 
small, by W. G. Cochran (1936) and J. B. S. Haldane (1937). 


2. SAMPLING A LIMITED POPULATION 

We shall first derive the mean and variance of #* in samples from the limited 
population. Since the denominator of #? is independent of the way in which the 
aggregate of N is divided into groups, we need only consider the mean and 
variance of the numerator S(z, —x_)? = S, (say). 

When we wish to speak of the observations as an undivided set we shall denote 
them by y; (j=1,2,...,.N). When we consider the observations divided into / 
samples, as they are when given to us, we shall denote them, as hitherto, by ~,, 
The x, are regarded as a random partition of the y; into k groups. For the purpose 
of the present section there is no loss of generality in choosing the origin so that 


dy; = 0. (This involves, of course, ¥ > x,; 


0.) It will be convenient to write as 
j t 


l 


the second and fourth cumulants of the N 


y's, 
(sx 
Ky UN 2 ne, he a ee (3) 
N(N-4 1){ S94)-3(- (Sy) 
va 7 eC er Be aed ies (4)" 


and to denote expectations over the limited universe by &. Then 


. a . 9 . 
(Seu) — (Yek+ ¥ ae) 
. t . t t+ ~\d 
=> a eee ee Pere (5)4 
t Wy t 


* The notation K, and K, is used instead of R. A. Fisher’s k, and k, to avoid confusion with 4 
which has been used for the number of groups. 


+ &, by this convention, contains n,(m,—1) terms, but only }n,(m,—1) are different. 
i+v 
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We note that any term 2? will have the same expectation, viz. 


and any term of form 22’ (i.e. product of two different z’s) will have the same 
expectation, viz. 


ad (2). — Oe) Ry " 
6G (x2) s(za—y) wn eo eS (7) 


To obtain the expectation of S,, it is therefore simply a matter of counting how 
many terms of each kind there are in (5) and using (6) and (7). We find 


Chiu GOR, eo Saas (8) 


To obtain the variance V(S,) we have 


as — %: a ; (9) 
.=)> ; > Cis ene 
t My t=?’ 1,14 
This involves terms of five types, viz. a, vx’, xa’?, x*x’a"’ and xa’x"’x’”. Each 


term of a given type has the same expectation. It is sufficient therefore to count 
1 2 2 
how many terms of each type appear in >~% and | >x;;) | > 2, | . These counts 
i i ; 
are shown in Table I. 
TABLE I 
No. of terms of each type in 


Type of term 


( Xx,; ) (S2,,) (Dap; ) 
/ 
xy n 
ez’ 4n(n,—1) 
ax”? 3n,(n,—1) NyNy 
ok 6n,(n,—1) (n,—2) Ny Ny(Ny+ Ny — 2) 
pm ike ie vial n,(n,—1) (m,—2) (n,—3) N,N, (N,—1) (my —1) 


Making the necessary summations over ¢, (9) gives 


1 1 l 
ri Ss) — 1 > C(y4) 4 41 );—S O/ 43 k24+49¢-—3Y5 O/ »2'2 
6 (S49) (>5-)ee >, Jee e) | >, )e6 tice | 
a 
+ ( —2k2 + 2kN +4N — 16k + 125 —) &(@*e'a"’) 
\ Tt % 


a 
- | N2-—2kN +k? -—4N 4+ 10k-—6 bi )< aeee hee (10) 
Ny 


t 
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But, remembering that > y; = 0 and proceeding as in reaching (7) we have, 
J 


— &(x*) 
&(a3z') = —— 
(@) = a5) 
Of nQnrnl\ —_ — &(a*) + N{E(x?)\* 
CO(2-2 *) = (WA) 


set 26 (a*) — NSE(a?)* 
Oy 2p! on!) — ‘ 
6 (x*a x’) (N As IN 9) 


ean'a'tx!") = — OE) + BN {6 (22)? 
ss aia (N —1)(N —2)(N—3)° 





Also, since &(x?) = (Su)/ and &(a*) = (su)/x it follows by (3) and (4) that 
j \j 


(11) can be expressed in terms of K, and K,. Making these substitutions into (10) 
we obtain by straightforward algebra 


, _ KXN-1)(K-1)_ , (2k-1)(N-k) (? 1) 
£(S?) = -- 4 ee. 9 
or (N +1) 4| N(N+1) (79 >i) c*3) 
Whence, by (8), 
nom . 2e—-I)(N—B){., £) fF <1) 
ety = a= ote a S : 
V(S?) (N +1) \Ke N| Kay oe I (13) 
y : We S; 2 S; ms S, 
Now E S,-2 ) Ty “(N—1) * ate (14) 
j 
Therefore by (8) and (13) 
a (k-—1) 
Me: y2 = ety hes A ee Pe or eee 5 
lean / (V1) (15) 
am 2(k—1)(N—k) | K, | K OP. 58) 
E?) = ——4 }\_ : ~% ...(16 
V2") = (n+ 1)(N— 1)? |! > WK3| ~ (WV —1)°K3 Nn (16) 


The mean and variance of #? for the limited population may usefully be compared 
with the mean and variance of H? derived from (2), which are the appropriate 
frequency constants when the samples are interpreted as randomly drawn from 
a normal population. From (2) 


lila : 

Mean £ eg Sat See (17) 
Uf = ea 4 

v(m — b=) (Nh) a8) 


eS 2, ee tel 
(15) agrees exactly with (17), and (16) differs from (18) only in the inclusion of a 
term K,/K3. This term will be relatively small if N is large enough and the n,’s 
not too unequal. (In the particular case where the 7,’s are all equal, we shall have 
k?/N equal to > (1/n,), and (16) will simplify owing to the vanishing of the last 
t 

term.) In these circumstances no essentially different conclusions will be drawn, 
whether the samples are regarded as drawn from a limited universe or, as is usual, 
from an unlimited normal universe. 
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When N is large in comparison with k, the Beta-function approximation (2) 
is practically equivalent to the assumption that (N — 1) Z* is distributed as x? 
with (&— 1) degrees of freedom. This result has also been found by Geary (1927, 
p. 106) following a rather different approach to the problem. 

An index, which will show whether (2) approximates closely enough the 
distribution of Z? in the limited universe, is provided by the ratio, R, of (16) to 


(18), viz. K, (N+1)K, (I? 


R= 1—WKi- (b—1) (N—k) Ki \N 


 s 
os 3 a (19) 

tT 4) 
The closer this is to unity the better the approximation is likely to be. As an 
example, Table II gives values of R corresponding to N = 30, k = 3 and different 
values of n,, v2 and ns. 


TABLE 


Ny No Ns R 

10 10 10 1—0-033 K,/K3 
5 10 15 1—0-014 K,/K3 
2 3 25 1+0-131 K,/K? 
1 4 25 1+0-251 K,/K? 


The table shows how the last term in (19) becomes relatively important when the 
sample sizes are very disparate, although, up to a point, inequality of sample 
size has the effect of making R closer to unity. This is due to the fact that 
oe: Seis ie 
-—-> is necessarily non-positive. 
N Ft, 2 

Apart from the sample sizes, (19) depends only on K,/K3. Now it is possible 
to show that (su)/(u.) must lie between 1/N and (N?—3N +3)/N(N—1) 

j i j 


and hence that K,/K2 must lie between — 2(N —1)/(N—3) and N. Hence limits 
may be set to the possible values of R. In particular, when all the 7,’s are equal 
we see that R must lie between zero and 1+ 2(N—1)/N(N—3). With N large, 
therefore, there is in this case no possibility that the variance of #* in the limited 
universe will exceed by much, the normal theory variance. These results are 
similar to those obtained by me in an investigation into the theory of randomized 
block experiments, and discussed somewhat fully in a previous number of this 
journal (Welch, 1937, p. 28). 

When R differs sufficiently from unity to make the normal theory approxima- 
tion inadequate, the question will still remain as to what other method of 
approximating can be adopted. One such method is to use the true mean and 
variance of (15) and (16) and fit a Pearson Type I curve with extremities 0 and 1. 
Alternatively expressions for higher moments may be obtained and used. How- 
ever, in any attempt to represent the distribution of #* in the limited universe 
by a smooth curve, it must be borne in mind that the distribution is essentially 
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discrete. Further, it is probable, that it is in just those cases where R differs 
considerably from unity, that the distribution will tend to be most irregular. Any 
very elaborate method of fitting a smooth curve may not therefore be justified. 
With very small samples it will, of course, be possible to evaluate without great 
difficulty, sufficient of the discrete distribution of H?, to see where the 5 % level 
of significance falls. Whether this is worth while, depends on the manner in which 
the samples are being interpreted. 


3. SAMPLING A MORE EXTENDED POPULATION 

One argument for the limited universe approach is that it seems to involve a 
minimum of hypothesis, not assuming anything which is not given directly by the 
observed sample values. Nevertheless the limited universe is still only a mental 
concept. It does not have the same concreteness as a population, say, of un- 
employed workers, from which a certain sample is drawn to provide the basis of 
a social enquiry. This latter population definitely exists and could be sampled 
in its entirety if necessary. But a universe generated by shuffling an observed set 
of samples does not correspond to anything concrete. Only the observed samples 
are really possible. For example, where a randomized field experiment is carried 
out, only the treatment actually used on a plot has a corresponding real yield. 
The other treatments cannot yield figures for that plot at the same time. We can, 
however, make a mental construct, an hypothesis, as to what they might have 
been. The hypothesis may be that on every plot the other treatments would have 
yielded the same as the observed, and this can be tested. The discussion of the 
previous section will then be relevant. But in cases where there has not even 
existed the possibility of the observed individuals being classed into groups, other 
than as they actually are classed, it will not be making any more serious assump 
tions to interpret the samples in the usual way, as being drawn from an unlimited 
population, rather than from the constructed limited one. In this section, there- 
fore, we shall consider the appropriate theoretical distribution to which the 
observed EH? should be referred, when the / samples are regarded as being drawn 
randomly and independently from the same infinite population, not, however, 
necessarily normal. 

Use can still be made of the results of the previous section, for all the con 
figurations obtained by shuffling the observed results are still equally likely. 
We are, in effect, taking the additional step of regarding the aggregate of N asa 
random sample of N. 


For the infinite universe therefore we have from (15) 
and (16) 


 — (k—- 1) 9 
Mean FE? = ae ao Le ee (20) 
Peat 2(k—1)(N—k) | a) ay (kh ms 
V(H?) = = 1 ——2\-—-—— Cos > Pee 21 
(") = (w4)(W—1)?\'— Wi W-1? WW nf (2") 


where a, is used to denote the expectation, in samples of N, of K,/K3. (In 
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vw 

formulae (3) and (4), of course, y; will be replaced by (x,;—2_), as the mean 2_ is 
now allowed to vary.) Note that in the case where the infinite universe is normal, 
ay = 0, and (21) becomes (18). In general a corresponding value of R will be 
obtained by replacing K,/K? in (19) by a,. Since, whatever the sample of N, 
K,,/ Kis forced to lie within certain limits, so also is a,. If the population sampled 
is continuous and of known shape, so that a, is known, then the distribution of 
E£? will range continuously from zero to unity with known moments given by 
(20) and (21). It may then be approximated by the Type I curve 


p( *) = const. x (#*/-*(1— B24, neces (22) 
ne 9 l—2.) (a. —2 
where =‘ iV : Ms). m = dbs . e = Ma) are: (23) 
(Ha — #44") (#2 — #44") 


je, and ys being the first and second moments of H* about zero. More generally 
ay will not be known and in that case, an unbiassed estimate of it is provided by 
K ,/K3. We should then use (16) instead of (21) in (23). If we judge significance 
from levels calculated in such a way, the levels will change with different aggre- 
gates of N and some further investigation is necessary before it can be definitely 
stated that in the long run we shall be running the stipulated risk (say 5%) of 
rejecting the hypothesis of a common source for the k samples, when it is, in fact, 
true. There is, however, no obvious reason to expect much deviation from this 
prescribed risk. 


4. THE x? TEST FOR HOMOGENEITY OF BINOMIAL SERIES 
This test can be deduced as a particular case of the H? test by supposing that 
x is a variate which equals 1 when the individual has a certain character A, and 
equals 0 when the individual does not have the character. Let z, denote the num- 
ber with character A in the ‘th sample and let Z = > z,. Then 


’ : ~» (ome. « Z 
S(%,—2_ )* Sxyz; N Z\ | = 
: 2, 2\* 
and S(a,—x.)?=> n( = 
t vt 4 
x z, Z\* 
= yn ny x) 
whence R2 z 
Z\ i-s 


N E* is therefore seen to be equal to the measure of dispersion obtained by 
applying the general y? method of squaring the deviation of each frequency from 
its estimated expectation, dividing by this expectation and summing over all 
categories. In another terminology N #*/k is the Lexis ratio.* In the present 
discussion we shall denote the above measure of dispersion by D, and the Lexis 


* Tn yet another terminology EZ? is equivalent to the mean square contingency ¢?. 
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ratio by Z, and we shall suppose that the sample sizes are equal, although this 
restriction is not necessary. We then have 
pita 
ONL poe eee 8 ee (24) 
N n bz(n. —Z) 
where 2 is the mean of z,. 

It is known that when the expectations in the samples are large enough, the 
distribution of D is well represented by the tabular y?-distribution with f = (k— 1), 
but for very small expectations (or at least for very small ) this is known to fail. 
As recent discussions of the latter case we may instance those of W. G. Cochran 
(1936) and J. B. S. Haldane (1937) (although Haldane is concerned for the most 
part with the case where expectations are given a priori and are not, as here, 
estimated from the data). The results of the previous sections throw some light 
on the conditions under which y? fails and suggest an alternative procedure 
which may be of value. 

In the first place we may note that whether we are considering a system of 
repetition where the total Z is fixed, or whether we are considering the more 
extended population where Z also can vary, we have exactly from (15) 
(k—1)(N) eae 

“ Ml eens (25) 

(N —1) 


For the tabular y’, the expectation is (k—1), which suggests, perhaps, that we 


Mean D = (N x Mean E?) = 


;, - ; I 
should get better agreement with the tabular y? if we multiply D by ( 1— 7): 
\ 4 


However, as the total number of individuals in all the samples will almost cer- 
tainly be large, this is not the main source of discrepancy. Proceeding to the 
variance of D, we see, from (16), that its leading term is 
. ° J2 J . 
V(D) = N*x Vey = Eee (26) 
(N +1)(N—1)? 

For N large this tends to the tabular x? value 2(k — 1), only if k is small compared 
with N, i.e. if the individual samples are large enough. Cases where n is too small 
occur, for example, where the samples are litters of mice or, as an extreme case, 
iuuman twins. In such cases, provided, of course, that k is not also very small, it 
appears likely that to refer the H? of (24) to the Beta-function (2), will be a satis- 
factory alternative procedure. Stated in a slightly different way, this amounts 
to judging significance by means of Fisher’s z test, where 


> (+, --2)? kz(n —2) — > (%—2)?) 
ae 


| S —3 
z= Slog, : ; eer * 27 
| (£1) (N —k) | (27) 
and f, = (k-—1), fo = (N—k). 
For example, consider the case n = 2, k = 20, N = 40. Suppose we happen to 


be sampling a common binomial population whose p = }, and that sampling is 


without restriction on the total Z. The true distribution of the H? of (24) may be 

















B. L. WELCH 157 


worked out completely. This has been done and the results are presented in 
Table III. The possible values of H? have been grouped and the second column 
gives the true chance that H* should be equal to or greater than the value E? 
given in the first column. In the third column are given the corresponding 
probabilities calculated on the assumption that H* is distributed continuously as 


Bearing in mind the essential discreteness of the true distribution (there are 
actually only about 100 distinct values of H? with probability greater than 0-0001), 
the approximation would appear to be good. In the fourth column of the table 
are given approximations to the same probabilities calculated on the assumption 
that D = N E? is distributed as x? with f = 19. As is expected, the agreement is 
not now so good at the tails (which are the most important), the variance of the 
tabular y? being roughly about twice the true variance of D (cf. Cochran, 1936, 
p. 214). 
TABLE It 








| i | | 

Truc Beta-function x* 

| & |) eatsm | Fee) eee 

| * P(E?*>E:;) | P(#>E) | 

| at. Bs a | 
<f | 
0-00 1-0000 | 1-0000 1-0000 | 
0-25 0-9979 0-9868 0-9529 
0-30 | 0-9830 0-9557 0-8856 
0-35 0-9182 0-8892 0-7837 
0-40 0:7555 00-7778 0-6573 
0-45 0-5902 06260 0-5224 
0-50 0-3998 0-4540 0-3946 

} 0-55 0:2937 0-2902 0-2843 
0-60 0-1962 0-1594 0-1962 
0-65 0-0728 0-0728 0-1302 

| 0-70 0-0328 0-0263 0-0834 
0-75 0-0125 0-0070 0-0518 
1-00 0-0000 0-0000 0-0033 


We may conclude that the distribution of H? used to test the equality of the 
means of normal populations, is also useful for judging the significance of the 
index of dispersion D, when expected frequencies are small due to being small. 


5. FURTHER REMARKS 


In the last section only the leading term in the variance of D was taken into 
account. It is of theoretical interest to consider the exact expression. Still con- 
fining ourselves to the case », = n, and in the first instance considering the case 
where the total Z is fixed, we have from (16), 


2(k —1)(N?)(N—&) | 


V(D) = - = 


(N+1)(N-1)? | NR 
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i Z 
But since (Su) = S(z,,-—z_)? = 41 —\ ) 
j 


N 
f Z\/ 3Z 32? 
and similarly ( v) = a1 -3) ]—-—-+-— ) 
. > 7] N}\ N N?2 


we have from (3) and (4) 


Ky _ {WN +1)-6NPQ} (N-1) 


Kz (N—2)(N—3)PQ 

where P has been written for Z/N and Q = 1— P. Therefore 
2(k—1)(N*)(N—k){, (N-1)(N+1-6NPQ)) me 

(W+1)(N—1)? \' N(W—2)(W—3)PQ f° 78) 
It will be clear from this equation that V(D) will depart considerably from the 
first term approximation, if either P or Q is very small. The limiting case is 
V(D) = 0 when either P or Q = 1/N. In general for N large and P small the 
multiplier in the curled bracket of (28) is approximately (1 —1/N P) which can be 





V(D) = 


taken to be unity if N P (i.e. the fixed total Z) is large enough. The maximum of 


V(D) occurs when P = }. The multiplier is then 1 + 2(N —1)/N(N —3), but this 
will be close to unity for N large. 

Considering next the case where Z is no longer fixed but is allowed to vary in 
repeated sampling, the variance of D will now be the expectation of (28). This 
cannot be written down exactly in terms of the population p and qg, but will be 
given approximately by the substitution of p and q for P and Q. The exact 
expression requires the expectation of 1/PQ. 


6. SUMMARY 

The distribution of the correlation ratio Z? has been considered. In the first 
instance the mear and variance of H* have been derived for the limited universe 
generated by repartitioning the aggregate of all the samples. From here the step 
is made to the distribution for an infinite universe, not necessarily normal. Some 
light is thrown on the range of applicability of ‘normal’ theory. 

The index of dispersion used for testing the homogeneity of binomial series is 
treated as a particular case. The x? distribution is known to be inapplicable to 
this index, if the samples are too small. A method of treating this case is suggested. 
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SOME ASPECTS OF THE PROBLEM OF RANDOMIZATION* 


Il. AN ILLUSTRATION OF “STUDENT’S” INQUIRY INTO THE 
EFFECT OF “BALANCING” IN AGRICULTURAL EXPERIMENTS 


By E. 8. PEARSON 


1. INTRODUCTORY 


Ln § 4 of his last paper on “‘ Comparison between balanced and random arrange- 
ments of field plots”’ (““Student’”’, 1937), the late Mr W. 8. Gosset set before his 
readers one of those simple yet fruitful ideas which have been so characteristic of 
his contributions to statistics during a period of thirty years. The section in 
question was entitled “ The effect of ‘ balancing’ on the ‘validity’ of conclusions”. 
The matter dealt with in this and the preceding sections may fairly be said to 
bristle with topics for controversy. Nevertheless the fact that “Student”, who 
had an intense dislike of controversy, felt at last impelled to set down on paper his 
views hereon, is evidence of the strength of his conviction that some protest must 
be made against the claim, so often repeated, that without randomization the 
results of experiments are invalid. 

In the last few months copies of letters exchanged between “Student” and 
his agricultural correspondents scattered over the world have come into my 
possession; they show well what an exceedingly helpful correspondent he was and 
at the same time make clear that he gained much himself from this long-range 
exchange of ideas, as he would himself have been the first to admit. It was there- 
fore interesting to discover from a letter to a friend in Australia, written on 
7 March 1937, the actual date on which “Student” saw ina flash the consequences, 
described in the section of his paper to which I have referred, of balancing treat- 
ments in a randomized block type of experiment. The genesis of his idea seems to 
have lain in a re-examination of some experimental analyses of uniformity trial 
data which Mr A. W. Hudson of Massey College, New Zealand, had sent to him as 
long ago as October 1933. *‘This” [a study of Hudson’s data], he wrote in the 
letter to Australia, “‘ put me on to a great truth which should, of course, have been 
obvious if one had only thought about it.” And later: ““Sorry to bore you with all 
this, but I only got hold of it yesterday !”’ 

In the present paper I shall attempt to investigate a little more fully, and in 
as objective a manner as possible, the central idea that ““Student”’ had in mind. 
In his paper he applied it to the case of the randomized block and the half drill 
strip lay-out. I shall deal here with the former problem, and after setting out 


* For an earlier paper under the same general title, see E. 8. Pearson (1937). 
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rather more fully than he did the algebraic background of his result, shall illustrate 
its meaning with the help of diagrams on some of the data used by Mr Hudson in 
his Appendix (‘‘Student”’, 1937, pp. 376-9). 


2. THE RANDOMIZATION SET OF TREATMENT PATTERNS 


Suppose that in an agricultural experiment designed to compare k “‘treat- 
ments’’, the experimental area is laid out in » blocks each containing k plots. 
The yield from the jth plot on the ith block may be denoted by 2;,, 

(¢=1, ...,%; j=l, ..., B), 
while the yield from the plot in this block receiving the rth treatment will be 
denoted by 2;,,) (7 = 1, ..., &). Thus the subscript 7 indicates the position of the 
plot,while r indicates the treatment it receives. Were it desired to indicate that the 
(ij)th plot receives the rth treatment we could write x;;,,). The analysis of variance 
procedure carried out to test whether there are significant treatment differences 
will then consist in calculating the sums of squares shown in the following table: 


TABLE I 


Degrees of freedom 
Treatment S,* = Un(%ip)— 24)” k-] 
: 
Error So* = X (XiqQy7—Lyy— Lin Ht Uo)" (n—1)(k—1) 
ivr 
Total S.* = X (ty) — 2)? = U(Xjjz— Xx)" n(k—1) 


Here 2; #.;,) and 2, stand as usual for the block means, the treatment means 
and grand mean, respectively. If now there are no treatment differences what 
soever, and 2;,,. may be considered as made up of a block term plus a normal 


random residual, say 
F z 


Lily) re eee, i (Se: ieSe.s iP 5 eee (1) 


then the probability distribution of 


“= Fil woe (2) 
Ee a SO Lt 2 


is of well-known form,} which may be termed the “normal theory ”’ distribution 
of the ratio of two independent estimates of a common variance. 

Because experimentalists have been doubtful of the justification of supposing 
that the v,;,, of equation (1) would in practice, when there are no treatment 
differences, be independent normal residuals, it has been customary to emphasize 
the importance of randomly assigning the treatments to the k plots within each 

* In what follows S,, 8, and 8, will be used to denote these sums of squares only in the case where 
there are no real treatment differences, e.g. when the analysis is applied to uniformity trial data. 


+ For purposes of this discussion it is simpler to deal with the quantity u, rather than with 
z=} log,w. 
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block. It will be seen that there are (k!)"—* possible partitions of the nk plots into k 
undifferentiated groups of n, such that each group contains a plot in every block; 
these may be termed the randomization set of treatment partitions or patterns. 
When a pattern has been selected there will still be k! ways of laying down k specific 
treatments ; the first treatment, say A,, may be placed on any one of the k groups 
of plots, A, on any one of the remaining k— 1 groups, and so on. There are there- 
fore in all (k!)" possible arrangements* of k treatments. One of these arrangements 
will have been selected for the experiment. In order to test the hypothesis that the 
treatments are equivalent, the value of u of equation (2), resulting from this 
experiment, may then be referred to the set of (k!)"-1 values which would be 
obtained if all the tr:atment patterns of the randomization set were applied to 
the observed plot yields «,,,. This set of values of u constitutes what may be termed 
the distribution of u under randomization. As Eden & Yates (1933) suggested ex- 
perimentally and Welch (193’}and Pitman (19375) have shown by more extensive 
investigation, if there are no real treatment differences the distribution of u under 
randomization is unlikely to differ seriously from the normal theory form. The 
total sum of squares S, of Table I will be the same in every case, but the apportion- 
ment of the total into the parts S, and S, will vary according to the pattern used. 

As an illustration of the points under discussion, I have shown in Table IT 
two of the treatment patterns (or arrangements)} applied by Hudson} to Mercer 
and Hall’s uniformity trial data for mangolds (see “Student” (1937), Appendix, 
Table I, 2nd row). There are four hypothetical treatments a,, a, @, and a, 
arranged in 10 blocks; the 40 plot yields given in pounds are shown below the treat- 
ment letters. The first arrangement was obtained by Hudson randomly, thesecond 
isa balanced arrangement; both patterns associated with the arrangements belong 
of course to the set of (4!)° possible patterns of the randomization set. Comparable 
with Table I, we have the analyses of variance shown in Table LIT. 

It is seen that S, is considerably smaller for the balanced than for the random 
arrangement; consequently wu is also smaller in the former case. Neither value of 
w is however significant. ‘Student’ emphasized the fact that out of the randomi- 
zation set, balanced arrangements would on the whole be associated with the 
smaller values of S, and consequently larger values of S,; in other words balancing 


* The distinction between the number of treatment patterns and the total number of arrange- 
ments is of no importance when treatment differences do not exist. Its meaning when they are 
present will be discussed more fully in § 3 below. 

+ These are “patterns” if we think of a,, a,, ..., as mere indices of the plot groups; they are 
“arrangements ”’ if we associate them with specific treatments, e.g. a, = A, = sulphate of ammonia, 
dy = A, = nitrate of soda, etc. Clearly there would be 4! = 24 ways of associating the indices @ with 
the real treatments A. In so far as we are assigning hypothetical treatments to uniformity trial 
data the distinction is of no importance, and following “‘Student’s” terminology we shall speak in 
this section of “‘arrangements’’. 

t Ishould like to thank Mr Hudson very warmly for looking out his original working sheets and 
forwarding them to me from New Zealand. I am also glad of the opportunity of making further 
use of computations into which he must have put an immense amount of labour a few years ago. 


Biometrika xxx Ir 
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TABLE II 


Hudson’s allocation of treatments, I, 2 


Random arrangement Balanced arrangement 





| 
a, as a, a, a, My as Ay 
1401 1403 1339 1373 1401 1403 1339 1373 
| 
| as a, Ay a, a ee he ek 
| 1312 1335 1334 1325 1312 1335 1334 1325 
a, a, as as as as ay a, 
} 1337 1325 1310 1264 1337 | 1325 1310 1264 
a, a, a, a, A, } a, ay as 
| 1397 1332 1304 1295 1397 1332 i304 1295 
| _&% a, a, a, ay a, a3 | Qs 
% | 1380 1314 1309 1387 1380 | 1314 1309 1387 R 
ca a, as a, ay Ay as a, a, = 
1373 1260 1314 1375 1373 1260 1314 1375 
| a3 As ay A, as As ay a, 
| 1388 1272 1222 1272 1388 1272 1222 1272 
As a, as as a, a, ay as 
1268 1290 i268 1333 1268 1290 1268 1333 
| a, as Ms, Ag ay a, as As | 
1310 1293 1274 1321 1310 1293 1274 1321 
| a, as ay As Ay As A, a, 
| 1276 1239 1215 1175 1276 1239 1215 1175 
North North 
Deviation from Deviation from 
Plot means grand mean Plot means grand mean 
ay 1323-0 +10-2 ay, 1312-0 —O-8 
As 1308-7 — 4-] As 1321-1 +83 
as 1300-6 — 12-2 as 1310-9 —1-9 
A, 1319-1 6:3 a, 1307-4 —5-4 
Grand mean 1312-85 Grand mean 1312-85 
TABLE III 
Significance levels for u: 5%, 2-96: Sn 4-60. 
Sum of squares Mean squares 
j 
Random Balanced Random Balanced 
Treatment 3 S; 3093-7 1022-9 1031-2 341-0 
Error 27 S, 54113°3 56184-1 2004-2 2080-9 
Total 30 S, 57207-0 57207-0 u=0-514 u=0-164 
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would tend to reduce the bias in the treatment means, «.;,,, due to soil heterogeneity. 
The result of Hudson’s investigation bore out this contention; out of fifteen 
experiments the balanced lay-outs gave a smaller S, than the random on twelve 
occasions, the reduction being very considerable in some cases. As a consequence, 
when there are no real treatment differences, the distribution of the ratio of 
cstimates of variance, u, is unlikely for balanced arrangements to follow even 
approximately the normal theory form. There is certainly no harm in this result 
when treatment differences do not exist, for nothing is gained by believing once 
in twenty times that a difference exists when it does not. The real question, 
however, is what effect will the tendency of obtaining larger values of S, among 
balanced arrangements have upon the efficiency of the test in detecting the pre- 
sence of real treatment differences when they exist? It was on this point that 
“Student’s” work has thrown new light. 


3. “STUDENT'S” METHOD OF COMPARING THE EFFICIENCY OF BALANCED 
AND RANDOM ARRANGEMENTS 

In dealing with the position when real treatment differences exist, it is neces- 
sary to extend somewhat the ideas and notation discussed in the preceding 
section. [t will be noticed that in laying down the experiment an opportunity for 
choice has occurred at two stages: 

Stage 1. It has been necessary to select a particular treatment pattern out of 
the randomization set of (/!)"-! patterns. Two such patterns were shown in 
Table II, the particular groups of plots to be associated with the same treatment 
being indexed by the letters a,, ag, ..., a;,. These may be conveniently described 
as dummy treatments. 

Stage 2. It is further necessary to decide how to associate the & specific treat- 
ments under investigation, say A,, Ag, ..., A, with the dummy treatments 
@,, Ay, .-., @,. There will be k! ways of doing this. If there are no real treatment 
differences, as when applying a hypothetical treatment pattern to uniformity 
trial data, it is immaterial which of the &! alternative ways of associating the a’s 
and A’s we make, but in actual practice when laying out an experiment this 
second choice must be made, and presumably it will be quite randomly made.* 

“Student’s”’ approach to the problem was as usual very simple; it consisted 
essentially in two steps. In the first place he suggested that the position could be 
explored by regarding the plot yields as represented by what amounts to the 
following symbolic equation: 

Lilys) = My) + Oy. 
Here 2; :,,.) represents the yield from that plot in the ith block which at stage 1, 
in choosing the pattern, was assigned dummy treatment a, and at stage 2 
* The experimentalist choosing a random arrangement will no doubt often combine the two 
stages and make a single choice. If, however, at the first stage he selects some pattern, say, from 
a printed series, the second choice has still to be made. 
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received the real treatment A,. It is built up of two additive parts; the first part, 
m, ,), is associated with the plot in the ith block to which a, has been assigned and 
would be the same whatever real treatment were applied; the second part, 6, 
would be the same for treatment A, on all plots. The two subscripts r and s have 
been introduced to indicate that at stage 2 there are k! ways of associating 
| eee, eee 6,, with the plots indexed by the dummy treatments @,, ..., @,, ..., @, 
of a particular treatment pattern. It is seen from equation (3) that the term m,,,) 
will vary from plot to plot in exactly the same manner as would be found in a 
uniformity trial. If we suppose > (é,) = 0, then m,,,) is the average of the yields 


we should expect to find if it were possible to apply all k treatments in turn to a 
plot under the same conditions. 

Clearly an assumption is involved in equation (3), since it is supposed that the 
contribution 6, is the same on all plots treated with A, , whereas in fact there might 
well be some interaction between the treatment and the soil characteristics of a 
plot. Again since only one treatment will in fact be applied to any single plot, all 
combinations of m;;,, and 6,, except one, will be hypothetical. It must be re- 
membered, however, that any probability statements whatsoever that can be 
made regarding the test criterion w must depend on the construction of some 
conceptual model of this kind, and “‘Student’s”’ set-up needs no special pleading. 

If now we write x.;,,) for the mean yield of plots receiving A,, when this treat- 
ment is associated with the dummy treatment a, of a particular randomization 
pattern ; m.,, as the mean value of m,,,, on these plots; and other mean values as in 
§ 2 above, we shall have 


\ / 
Lips) = M.Gy+0,, Bey= Mey, BY=Meo- — ceceee (4) 


The analysis of Table I applied to the 2’s will give: 


TABLE IV 


Degrees of freedom 


Treatment Sy’ = Un(m 4) — My.) +8)" k-—] 
Tr 

Error S, = X (my,)—M.,)— My.) +M,.)* (n—1)(k-—1) 
ir 

Total S,’ = X (My) — My. +8,)" n(k—1) 
ir 


Here the error term, S,, depends only on the m’s and its behaviour under 
randomization at stage 1 we have already discussed in the preceding section. The 
treatment term S;* breaks up into three parts, the first of which is 

S, = Yn(m.y—mM.)” 
; 
* In this notation the convention referred to in connexion with Table I is being retained; 
namely S, and S, relate to sums of squares of terms containing no real treatment differences. 





——- 
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also depending only on the m’s. These two terms will, on “‘Student’s”’ hypothesis, 
remain the same whatever the combination at stage 2. 
The test criterion « may now be expressed as 


(n—1)S8,+n(n—-1) Is(32) + 2¥)(m.(,) — M9) 5.\ 


\s ; ] 
i ee 

The second step in “Student’s’” approach was as follows; by selecting 
a balanced pattern, the random element has been removed at stage 1, but a 
random choice remains at stage 2. Thus starting from a basic set of m’s, and a 
given treatment pattern, there will still be &! possible values of « depending on 
the way in which the elements in the product-sum in the numerator on the right- 
hand side of equation (5) are associated. Any one of the values of u will be equally 
likely to arise, on his hypothesis, since the treatment terms 6, will bear no relation 
to the terms m.(,,—m.,) representing bias due to soil heterogeneity. 

In §4 of his paper, “Student” has used the following terminology: 


and has spoken of these as (i) the actual variance of error, (ii) the calculated variance 
of error, and (iii) the real variance due to treatment. In this notation we may 


write 2 
ot ot 20,0n%x, 


| 
1 
T 
_~ 
~I 
— 


> 91 > 
es On Oo; 


where 7p, is the coefficient of correlation between m., 
noted that 


—m.,) and 6,. It should be 


r) 


n(k —1)02+n(n—1)(k—1) 02 = }(m,,)—m,y)? = 83, —«.... (8) 
where, for all the randomization sets of a given series of m’s, S, is constant. 

The existence of treatment differences will be detected when wu falls beyond 
the particular significance level chosen. To show the effect of balancing on the 
efficiency of the test, “Student” took the case k = 4, n = 6 and supposed 
it possible to pick out from the (4!)5 possible arrangements of treatments three 


which, when applied to the basic m’s, made 


(a) o¢ = o2 = Ss n2(k—1) = o° (say), | 
(b) o2 = 0-507, o2 =1-1loe?, 


(c) o? = 1-507, o2 = 0-90". 
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It will be noticed that (b) and (c) as well as (a) satisfy equation (8) which, for 
k = 4, n = 6, becomes ‘ 
; , 18024-9008 = S,= 108% sss (10) 

The variation in the uw of equation (7) will depend on the variation in 7, under 
‘randomization at stage 2. The distribution of this coefficient is of the type which 
we should find if we took two series of numbers say W, Ug, Us, Ug ANA, , Vg, Vg, Vy, 
for which ¥ (u;) = 0 = ¥ (v;), and calculated the 24 possible correlations 

t j 

(a .9.)/A/ 3 (y2) See 
X(u;%;)/V X (uz) (27) 


3 t 3 


arising from the 24 possible pairings of the w’s and w’s. ““Student’’ supposed* 
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that the distribution would be that of a correlation coefficient in a sample of four 
from a normal bivariate population with correlation p = 0, that is to say that ry, 
would be equally likely to assume any value between —1 and +1. On this 
assumption he was able to calculate readily the chance that the w defined in 
equation (7) would fall beyond the 5% significance level (in this instance at 3-287). 
The result of these calculations for the cases (a), (6) and (c) are shown in the 
table on p. 373 of his paper. These chances of detection are shown, for the extreme 
cases (b) and (c) in my Fig. 


|; in this presentation of the results two points should 
be noted: 


* While this is not strictly true, Pitman (1937a) has shown how rapidly the distribution of 7 
under randomization approaches that of normal theory as the number of elements is increased. 
See the case he illustrates with k = 5. 





—— 
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(1) I have taken as the measure of real treatment differences 


2 
P os. eee eee (11) 
meee? y oN nk’ 


which is the ratio of the standard deviation of treatment differences, to the 
estimate of the standard error per plot (the standard deviation of the v,,)’s of 
equation (1)) which we should obtain when o, = ¢,. 

(2) I have described the chance of detection of treatment differences using 
a given test of significance as the “power” of the test, and the curves as “ power 
functions”’. This I have done to conform with the terminology used by J. Neyman 
& E.S. Pearson (1936), in discussing this aspect of tests of statistical hypotheses 
from the general theoretical view-point. The third curve added to the diagram 
and described as that of normal theory, will be referred to in § 5 below. 

Since “Student”? had pointed out that balancing was likely on the average 
to give lower values to 0? = S,/n (k—1) than a random assignment of treatments, 
his conclusions may be simply illustrated on this diagram. The curves, which 
represent the chance of detecting treatment differences plotted against @, will 
rise more and more steeply the smaller is S,. Should S, = 0, the curve becomes in 
the limit a vertical line rising from the point* 0 = {ug9;(k— 1)/k(m — 1)}*, which 
in the present example is 0-702, wo; being the 5% significance level. A steep 
curve is associated with a zero chance of detecting small treatment differences, 
but as @ increases it will lead to a chance approaching unity more rapidly than for 
a curve of lesser slope. The two dotted curves in the diagram cross at about 
@ = 0:82. The properties of these steeper curves are therefore likely to be asso- 
ciated with balanced lay-outs. How far these properties are advantageous or 
otherwise, will be discussed later. 


4. FURTHER ILLUSTRATIONS USING HUDSON’S DATA 
The practical implications of ‘‘Student’s”’ argument will clearly depend on 
how far a difference between o? and o? of the magnitude indicated in equation (9), 
case (b), is likely to follow from balancing the treatments in the blocks. To 
investigate this point it seemed desirable to apply his method to certain of the 
treatment arrangements used by Hudson. The process which I have followed 
consists, in effect, of building up hypothetical trials by adding treatment differences, 
6,, as in equation (3), to the uniformity trial plot yields used by Hudson, which 
will now be the m,,,’s in the notation of § 3. The result may be first illustrated on 
the example set out in Tables II and III above, for which k = 4, n = 10. 
Instead of using the expression for u in the form (7), let us return to the form 
(5). For any given set of four values m.;,)—m., (r = 1, ..., 4), such as those for the 
random arrangement of Table IT, namely, 
+162, —41, —122, +63, = sahenes (12) 


* This follows from setting o, = 0 in equations (7) and (8) and then using (11). 
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and a set of four real treatment differences 6,, such as 
ae eS (13) 
there will be 4! = 24 different values of the numerator on the right-hand side of 
equation (5). These correspond to the 24 ways in which the series (12) and (13) 
may be paired to form 2(m.;,,—m.,))6,. Any one of these may be regarded as 
equally likely to arise in practice, since the assignment of particular treatments 
to the plots marked a, , a,, etc., in Table II will be entirely random. Since for the 
series of treatment errors (12), S, is given in Table III as 54113-3, it is easy to 
caleulate the resulting 24 values of wu. These values will vary about a mean of 
n—1){S,+n2(62) 
u(o?) = | a (2s) scones) 
Do 


0-5143 + 0-00665007, 
where Oe es mak ee eee (15) 
This straight line is shown in the upper portion of Fig. 2, in a diagram whose 
coordinate axes are u and o?. For a given o7 and S, (or S, = S;—S,), the variation 
of the 24 values of uw about their mean is proportional to o,. Retaining the same 
relative magnitude and sign for the 6,, as given in (13), but using an adjustable 
scaling factor, it was easy to calculate the 24 values of wu appropriate for various 
values of a7. These are shown as arrays of spots in the diagram; the 5 % and 1 % 
significance levels for wu have also been drawn. 
The same process, using the same set of values for 0? was applied to the balanced 
lay-out of Table IT. We now start with treatment errors 


a 2a awe oe OU Xe (16) 
and S, = 56184-1. The mean of the 24 values of w is now 

u(o7) = 0-1640+0-00641207, 0 sae (17) 
The situation is readily understood from a comparison of the two charts. As 
o?—>0 the 24 possible w-values close in towards one another, and when a? = 0 we 
have u = 0-514 for the random and 0-164 for the balanced arrangement. Neither 
of these values are significant. As a7? increases some of the w’s begin to fall beyond 
the 5 % level; this occurs sooner in the random than in the balanced case, partly 
owing to u(o7) being larger and partly owing to the greater spread of the 24 wu’s 
which depends on S,. When, however, u(a7) for the balanced case falls beyond 
the 5 % (or 1 %) level, the smaller spread in the w’s, resulting from the smaller 8S, , 
is advantageous, and the chance of detecting the existence of real treatment 
differences is greater than for the random case. A count of the number of values of 
u falling beyond the two significance levels for different values of o,, leads to the 
results shown in Table V, which illustrates the crossing over of the power curves, 
previously seen in Fig. 1. 


* Note that this quantity o, differs by a constant factor from “Students” o 7 defied in equation 


(6) (iii). 
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Fig. 2. 


The arrangement of treatments shown in Table IT are of course only two of the 


(4!)° alternatives of the randomization set. Each of these will have its regression 


line 


u(o7) = ug+bo7, 


and since as S, increases, w% and b increase, these lines will not cut. Approximately 
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TABLE V 
| | Frequency of u above 
| Bek ce een ee 
2 i nil op ae 
cy 5% significance level 1% significance level | 
Random Balanced Random Balanced 
100 °* 0 — — - 
200 4 0 — - 
300 10 5 0 - 
400 14 8 5 0 
500 16 13 9 4 
600 16 22 /& 6 | 
700 21 24 15 10 
800 22 - 16 18 | 
900 24 16 22 
1000 21 24 
1100 22 
1200 24 


5% of the values of uw, will fall beyond the level w..;= 2-96, and 1% beyond 


Ugo, = 4°60. Further, the spread of the 24 w’s in the arrays will depend on S, and 
0;. 

Although this simple method of presenting the situation was not mentioned 
in ““Student’s” paper as published, it was outlined by him in correspondence on 
the subject a few months before his death. 

The process of calculating the k! possible product sums of the differences 
m.(—m., and 6, becomes very lengthy when k>4. Luckily in this connexion 
Dr L. J. Comrie and Mr G. B. Hey came to my assistance with a scheme that 
could be easily worked with the Hollerith Calculating Machine. It was therefore 
possible to carry out the same procedure on a number of Mr Hudson’s random and 
balanced lay-outs involving six treatments and, therefore, 6! = 720 possible 
product sums. The method by which the data for Figs. 3-6 were obtained is 


described in an Appendix. All that need be stated here is that a basic set of 


hypothetical treatment differences d,(s = 1, ..., 6), was first selected and the 
product sums 2(m.;,)—m.())d, determined. Then writing 6, = gd,, it was easy to 
adjust these product sums to correspond with any desired value of o;, since 
q = 0,/0,, where 
o = p> (45), o? = p> (98)- wanes (19) 

Table VI shows four series of values of d, which were introduced as described below. 
Table VII gives the essential particulars of Hudson’s cases used. In each case S, 
for the balanced arrangement is less than for the corresponding random arrange- 
ment. This will certainly not always be the case in practice; my purpose is, 
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TABLE VI 


Values of d, used in experiments 


| 
Series 1 Series 2 Series 3 Series 4 | 
| 
d, +6 +5 +5 +5 
| d, 0 +1 +2 4 | 
d, —] 0 0 = 
ds a “a ee ~2 
d, ~2 at ~2 -3 
de —2 —4 —4 —% 
tr ee 
oq 2-7689 2-7080 2-8867 3-2660 


TABLE VII 


Data from Hudson's arrangements of six treatments in randomized blocks 


Values of m4) —m..) 


| 
rs 


Exp.*... Table IT, No. 4 Table III, No. 2 Table IIT, No. 4 Table ITT, No. 6 


Random | Balanced 


wT 


33650- i 


86045-0 


88029-3 


2676-8 2889-0 


3007°-1 


8667-5 


8951-4 


66375-4 


77129-8 


86446-0 


Random | Balanced | Random | Balanced | Random | Random | Balanced 

(B) (A) (A) (B) 

=] +682 | +11-4 +11 —1-4 +28 +3-0 ~12+1 ~ | =e | 
2 +40-5 + 5-4 +18 +1-3 +0-9 +3-3 — 7-2 +39-0 — 10-8 
3 —41-6 | —17-0 —3-1 +0-9 —5-4 —2-0 +410 | +108 | + 56 
4 — 63 + 2-1 +i-2 +0°7 —1-4 —3-3 +35-8 — 16-9 — 06 
5 — 73:3 — 54 +1-1 0-0 —2-8 0-0 — 19-0 —12-5 + Il 
+12-5 + 3-6 —2-] —1°5 —0-1 —1-0 — 38-7 — 14-6 + 7:8 

5437-2 1984-3 330-2 118-1 927-8 283-9 20070-6 9316-2 905-3 


85540-7 





n 4 16 Ss 4 


* The table references are those in Hudson’s Appendix (“Student”’, 1937, pp. 377-9). Table IT deals 
with a uniformity trial of sugar beet (Immer, 1932), and Table III with a uniformity trial for potatoes 
(Kalamkar, 1932). Note that », as in the text, indicates the number of blocks; k= 6 throughout. 
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however, in the first place to investigate the nature of the differences in the power 
function curves which result from differences actually met with in S, by Hudson 
within the randomization set. 

In drawing the diagrams the chance of detection of treatment differences, or 
the power of the test, might be plotted against 0? or o,. To bring the diagrams into 
standardized form and to enable comparison to be made with the normal theory 
curves, described below, it would be desirable to take as abscissa the ratio of o, 
to the true standard error per plot. The latter is however of course unknown, and 
all that is possible is to use some estimate of its value. For this I have taken 


a” = S,/(n—1)(k—1), using S, from the random arrangement, so that @ in the 


diagram is given by 


It might have been better to take o ‘as S,/n(k — 1), as when discussing “‘Student’s”’ 
hypothetical case on pp. 165-7 above, but this had not struck me until after the 
diagrams were drawn. The main point, however, is that the same value of o’ must 
be used in comparing the efficiency of the random and balanced arrangements. 

The four cases considered may now be described in detail. 

Fig. 3 (Hudson, IT, 4). Series 2 of d, values from Table VI were used. The curves 
show the chance of detection of treatment differences for random and balanced 
arrangements when the hypothesis o, = 0, is rejected if (i) u> wo»; = 2-901, 


(ii) u > %p.9, = 4:556. Since for the random arrangement in the original uniformity 


trial 
S, /S. z 
Uy == == 4-85, 
15 


and therefore lies beyond the 1 % limit, the form of the power curve is peculiar. 
Using the 5 % level of significance, we are certain to detect differences between 
6 = 0 and @ = 0-24; larger differences will sometimes be overlooked though the 
chance is never less than 9 to 1 against this. When @ > 1-66 we shall again be certain 
of detecting differences. For the balanced arrangement, using the 5 % level, we 
shall detect no differences until #= 1-07. From this point the curve rises rapidly 
and when @> 1-43 treatment differences will be certainly detected. It will be 
noticed that “‘ certainty ” is secured for the balanced at a slightly lower value than 
for the random arrangement; this is what ‘“‘Student”’ expected, but he had not 
perhaps realized the peculiar nature of the power curve for lower values of @ in 
this case where w is significant for a, = 0. 

The curves shown result, of course, from the particular series of basic d, values 
used, namely the series 20f Table VI. To examine what change in the curves would 
result if the distribution of real treatment differences were changed, similar 
calculations were made, using series 1. This series has a single exceptional high 
value, d, , the other five values being close together. TableVIII shows a comparison 
of the chances of detection for corresponding values of 0 = o,/o’; there is seen to be 
relatively little difference between the figures in the corresponding columns 
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TABLE VIII 


Random Balanced 


Chance of detection using Chance of detection using 


5% level 1% level 5% level 1°% level 
d-series d-series 
l 2 l 2 > l 2 l 2 


@=0-00 1-000 1-000 1-000 1-000 @= 
0-05 1-000 1-000 0-886 0-913 
0-10 1-000 1-000 0-744 0-740 
0-15 1-000 1-000 0-694 0-674 
0-25 1-000 0-997 0-684 0-665 
0-40 0-940 0-936 0-694 0-675 
0-60 0-906 0-914 0-744 0-739 
0-80 0-906 0-918 0-780 0-806 
1-00 0-933 0-936 0-822 0-858 


10 0-000 0-039 
“15 0-163 0-186 
20 0-467 0-414 
*25 0-672 0-617 
-30 0-799 0-767 
0-833 0-878 0-000 0-000 
-40 0-933 0-967 0-000 0-024 
-45 1-000 1-000 0-100 0-125 





ee 
Ww 
or 


-5O 0-333 0-315 
1-20 0-967 0-965 0-890 0-914 55 0-621 0-536 
1-40 1-000 0-983 0-951 0-943 -60 0-761 0-732 
1-60 1-000 0-998 0-997 0-978 -65 0-833 0-850 
1-80 1-000 1-000 1-000 0-994 -70 0-869 0-939 


headed “1” and “2”. In other words for a given value of S,, the chance of 
detection of treatment differences depends mainly on the standard deviation of 
the 6, and very little on the form in which they are distributed. 

Fig. 4 (Hudson, III, 2). Series 3 of the d, values from Table VI was used; it 
differs only very slightly from series 2. The power curves are shown for a random 
and balanced arrangement using the 1 % significance level for uw, which for 
k= 6, n = 16 is Ugo, = 3-271. Owing to the large number of replications, the curves 
rise steeply; the cross-over effect is again present. 

Fig. 5 (Hudson, IIT, 4). Series 3 of the d, values was again used. The power 
curves are shown for a random and balanced arrangement using the 5 % signi- 
ficance level for w; with k = 6, n = 8 we have u,.; = 2-485. The balanced curve 
crosses the random one rather later than in the preceding illustration. 

Fig. 6 (Hudson, IIT, 6). Series 4 of the d, values was used. In this case Hudson’s 
two random arrangements A and B are compared with his balanced arrangement; 
in calculating 0 from (20) the estimate o’ was taken from the S, of random 
arrangement A. Fork = 6, n = 4, Ugo; = 2-901. The balanced curve as usual rises 
very steeply and crosses the random (B) curve when the chance of detection is 
about 0-59. Owing to the small number of replications the chance of recognizing 
small treatment differences is in no case great. In the case of random (8B), the 
calculations were also made using the d-series 2 of Table VI; the change in the 


power curve was very small, the two curves twisting about one another with four 
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points of crossing. This confirms the conclusion suggested in the case of Hudson, 
II, 4, that it is the standard deviation of the treatment differences, 6,, not the 
pattern of their distribution, that really controls the situation.* 
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5. THE NORMAL THEORY POWER CURVES 
These curves have been shown as solid lines in Figs. 1 and 3-6. They are 
drawn from tabled values given by P. C. Tang (1938). Dr Tang’s work is of 
general application to all analysis of variance problems. In the case of random- 
ized blocks, his results are based on the following assumptions: 
(1) The plot yield y,, consists of the following additive parts, 
eS oe ee et er (21) 
(2) In this equation £; is a term constant for the ith block (¢ = 1, ...,m), and 
6. is a term constant for the sth treatment (s=1, ..., k), subject to the condition 
+(6,) = (0. 
(3) The residuals v;, are independent random variables normally distributed 
about zero with a common standard deviation oc. 
Starting from this basis it is possible to obtain the chance that 
n(n—1)D (y..—y..)? 


s 


LV Vis—Ys— Ye ty)? 


is 
will exceed a specified significance level for u, when the 6,’s are in fact not zero. 


* This result might be expected having regard to Pitman’s (1937a@) work regarding the distribu- 
tion of a correlation coefficient between independent variables under randomization. 
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Tang’s tables give this chance, associated with significance levels wp; and Up.o;. 
and suitable values of k and n, for increasing values of 


a _ /2(37) 


abe ko? * 





The curves shown in the diagrams have been obtained by plotting this chance 
of detecting a real difference against 0. Using the 5 % significance level the curve 
rises from 0-05 at 9 = 0 and approaches unity as 0-00. As was pointed out above, 
when plotting the results obtained from Hudson’s data, the true value of o is 
unknown and the @ of equation (20), having only an estimate of o in the denomi- 
nator, is not strictly comparable with the @ of (23). 

Supposing the power curves for a given series of d,’s and for all the (k!)"— 
patterns of the randomization set were obtained and superimposed they would 
form a network of lines.* The normal theory curve would lie somewhere in the 
centre of these, but how far for a given @ its ordinate would be approximately the 
average value of the (k!)” randomization ordinates, I have at present no idea. 
Since (using %».9;) when 0=0, about 5 % of the randomization set of ordinates 
will be unity (as for the random arrangement in Fig. 3) and the remaining 95 % 
will be zero, the average will here agree with the normal theory value of 0-05. 

6. CONCLUSION 

The main object of this account has been to make the thesis which “Student” 
put forward in his last paper as clear as possible with the help of further illustra- 
tion. In a subject where there are noted differences of opinion, an unambiguous 
presentation of a case is a first requirement. It seems desirable, however, to end by 
repeating what appear to be the conclusions which “‘Student”’ drew from his 


discovery regarding the form of the power curves, representing the chance of 


detection of real treatment differences. 

In co-operative experiments undertaken at a number of centres, in which as 
he emphasized he was chiefly interested, it is of primary concern to study the 
difference between two (or more) ‘“‘treatments” under the varying conditions 
existing in a number of localities. Using a similar notation to that of equations 
(3) and (4), in a particular local experiment we shall have for treatments “1” 
and ‘*2”’, 


y= Mi t+G,, Ze=MetGy, = = = — —— covias (24) 


and hence =. 2&1, — X%y = (M.,—M.g) +4, — Og = M..— Mgt Ajy (SAY)... (25) 


The practical problem is then to determine how 4,, varies from one set of condi- 
tions to another. For this purpose m.,—m.,, the difference in “‘Student’s”’ 

* No doubt in their calculation it would be best to take for the estimate of o, S,/n(k—1) as 
previously suggested. 


+ The notation has been simplified by supposing that r = s and by omitting the brackets round 
the subscript r. 
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terminology of treatment error terms within an experiment, should be as small as 
possible. A balanced arrangement of treatments was in his view more likely to 
lead to this result than a purely random arrangement. The fact that for such a 
balanced scheme the calculated estimate of the standard deviation of m.,—m.. 
would be inaccurate, indeed on the whole somewhat too large, did not worry him; 
for he considered that the real problem was to compare 2., —2.., the estimate of 
A,,, With its inter, not intra, locality variation. If the error term m.,—m., was in 
fact small, although its standard deviation might not be precisely known, values 
of x., — 2.. would be obtained leading to a consistent interpretation of the situation. 
On the other hand while randomization might enable the standard deviation of 
m.,—M., to be determined without bias, this result would be of little value if the 
greater fluctuations in this error term made it impossible to interpret the 
changes in x.,—2., from one experiment to another. 

This was a definite advantage that seemed to be gained from balancing. 
What, “Student” asked, was lost? Would the single experiment, considered by 
itself, be no longer of value? As a result of his investigation he felt satisfied that 
this would not be the case. The balanced experiment admittedly was less likely to 
detect small treatment differences than the random, and in this sense was inferior. 
It would not detect differences at all when there was perhaps a fifty-fifty chance 
that the randomized experiment would do so. Nevertheless it might be argued 
with reason that useful conclusions for the practical agriculturalist regarding 
treatment differences cannot be drawn until they can be based on something 
approaching certainty; in this region when the risk of error is 1 in 10 or less, 
corresponding to the upper portions of the curves in Figs. 3-6, the balanced lay- 
out seemed likely to give a slight advantage. 

Finally, another practical point was always at the back of “‘Student’s”’ mind; 
the ease with which an experiment could be carried out in the field. His general 
conclusions were not limited to the case of randomized blocks but might be 
exvected to apply in other forms of design.* The randomized treatment pattern 
is sometimes extremely difficult to apply with ordinary agricultural implements, 
and he knew from a wide correspondence how often experimenters were troubled 
or discouraged by the statement that without randomization, conclusions were 
invalid. The keynote of his paper may perhaps be summarized as follows: in 
weighing up the consequences of using a given experimental design and applying 


* It will be realized that a balanced randomized block may in some cases correspond to a 


Latin-square lay-out. For example the plan in Table I above contains two 4 x4 Latin squares. 
When this is so the sum of squares, S,, can of course be reduced by subtracting from it a row (or 
column) sum of squares. The question will then arise as to whether certain Latin-square patterns 
would be preferable to others, on ““Student’s” thesis. The “knight’s move” pattern is for example 
balanced to a greater extent than a randomly selected pattern will usually be, and as Tedin’s (1931) 
work has shown, gives a smaller treatment sum of squares, S,, on the average. The reduction is 
howeversmallerin this case than for randomized blocks, since as “Student” remarked the Latin-square 
is not only random but balanced “thus conforming to all the principles of allowed witchcraft”. 
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a statistical test to the results of the experiment, it is of more importance to 
consider (i) the chance of detecting real differences when they exist than (ii) the 
risk of concluding that a difference exists when it does not. The term “valid” 
has commonly been associated with a method which ensures a precise knowledge 
of this latter risk, but may not the most valid procedure be in fact one which, 
while giving an upper limit to risk (ii), makes as near certain as possible the 
detection of the larger and therefore most important differences? Whatever 
answer to this question is favoured, “Student’s”’ last scientific contribution 
should be invaluable in forcing on the attention of statisticians and experi- 
menters the questions here at issue. 
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APPENDIX 
The method of obtaining the data for the curves of Figs. 3-6. 
In any given case we start with: 
(a) The values of S, and m,,)—m.,) given in Table VII and the values of d, in 
Table VI. 


(6) The 720 product sums 2 (m 


4) —Mm))d, given by applying Dr Comrie’s 
method to the two series of six numbers m.,)—m,) and d,(r, s=1, 2, ..., 6). 
(c) The relation 6,=d,c,/o,, where o, and o, are defined by equation (19). 
It is then required to determine how many of the corresponding 720 values of the 
test criterion w defined in equation (5) will fall beyond the significance levels 
Uo.o5 ANd Up.9,, for specified values of o?. 
If we write ey ee ne (26) 
then the inequality 
u=(n—1){8S,+nko4+ 2no,Q/o7} Sy} > u,, 
where « = 0-05 or 0-01, may be written 
CT 
Q> _ fu, S.—(n—1)S,—nk(n—l)o?}. a... 27) 
2n(n—1)0,‘ * ? ( : ( e ( 


Given the 720 values of Q printed off on sheets by the Hollerith machine, it was 
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relatively simple to determine how many of these were greater than specified 
numerical values obtained by inserting into the right-hand side of equation (27) a 
suitable series of increasing values for o,. For the computation and counting 
involved Iam much indebted to Mr D. J. Bishop of the Department of Statistics 
at University College. 

It should be noted that since there is a finite number of values of Q, the power 
‘curve ” is not really continuous, but consists of a series of “steps”’. Counts were 
however only made for a limited number of values of o,, and the points obtained 
joined graphically by a smooth curve. When some of the d’s in a series have the 
same value, i.e. for series 1, 2 and 4, the power “curves”’ will tend to show greater 
sinuosities than otherwise, since the underlying ‘“‘steps”’ will be larger. 








A GENERALIZATION OF FISHER’S z TEST 


By D. N. LAWLEY, B.A. 


School of Agriculture, Cambridge 


1. Hotelling (1931) has generalized ‘‘Student’s”’ ¢ distribution for the case of 
a normal multivariate population and found the distribution of 7’, where 7' is 
defined by the relation 


0 4 c c 
>1 2 Op 
3 
c, “Gy te Ay» 
£4 Ge, sy ey 
c e £ 
7 hes G1 Ane Bnp A ij Di Sj 
= . 
|| @5 || | A | 
(i, 7 being summed over the values 1, 2, ..., »), where &), &, ..., §,, and af, 2, ..., 
a (s = 1,2,...,), are (v+1) independent sets of observations of the variates 
#1, %g, ..., , Which are distributed in a normal multivariate frequency distribution 
with zero means 
1 m ur 
Gy = —>, va?’ 
j a i 
Mw s=1 
and A,, is the cofactor of a,; in the determinant | A | = || a;,; ||. 


At a later date Wilks (1932) defined a generalized variance and found the 
appropriate A-criteria for testing certain hypotheses concerning the means, 
variances, and covariances of k normal multivariate populations from which k 
independent samples have been drawn. These criteria were developed more fully 
by Pearson and Wilks in a subsequent paper (1933) for the case of two variates, 
but though the sampling distributions obtained were in certain cases relatively 
simple the arithmetical calculations required for practical application were, in 
general, not of a very simple nature.* 

In this paper we shall find a quantity which may be regarded as a generaliza- 
tion of Fisher’s z and which provides a test suitable for dealing with certain 
generalized analysis of variance problems, having the advantage of being easy 
to apply. 


* The actual calculation of the A-criterion which appears to be appropriate in the present 
problem is no longer than the calculation of v* defined below. But when more than two variates 
are dealt with, only the sampling moments of the A-criterion seem to be known and the calculation 
of these certainly involves considerable arithmetic. It is hoped that a fuller comparison of these 
tests may be made in a further contribution to this Journal [Ep.]. 
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2. Using a summation convention for ¢ and j* we define 


At 
ye — WA 
| Pig | > 
i 
l ny ] Ne 
i Fs \ , . (s : 
where a, =— Lae), ai; =— Deez, 
n : = # : 
i1r=1 { 2s=1 
and a ¢= 1,2, ...,%, £° 2=1,3).4% 


represent two independent samples containing respectively n, and n, sets of 
values of the variates x; (i= 1,2, ...,) which are distributed in a normal multi- 
variate frequency distribution with zero means. We shall find the distribution 
of v?. 

We may suppose that the variates x; have zero correlations and equal variances, 
as otherwise they may always be replaced by linear functions of the x; having 
these properties, and v* remains invariant under linear transformations. 

First consider r to be fixed. Then (2, x, ..., ai, 2) represent the rect- 
angular coordinates of p points P; in a space V,,_,, of (n.+ 1) dimensions, O being 
the origin and OX", ..., OX’, OX being the coordinate axes. 

Hotelling uses the result that 


4) aay 
 , eieers. | ie hs: BENE ~Ht+2 
= 71 = n, cot? ¢,., 
where ¢, is the angle made by OX with the flat space V,, contained by the p lines 
OP.. 
Let V,, _,,,, be the flat space containing all lines through O perpendicular to 


V,, then clearly cot ¢, = tan@,, where @, is the angle made by OX with JV, 
Thus T? = n, tan? 6. 


p+l1- 


As the quantities x, ..., x), 2” vary so d.. the points P,, but they are dis- 
tributed about O with spherical symmetry. 

We may consider the space V,, __,,, to remain fixed while the system of axes 
varies, then the axis OX moves in such a way that all directions of OX in J, ,, 
are equally likely. 


Now let V, be the flat space contained by the axes OX’, ..., OX’). The 
intersection of V, _,,, with V,, is a flat space V,, _., of (n.—p) dimensions 


It is clear that V, _,,,, is given by the p equations 


Ns 

7 Ne THs Ar = ° n 

SF OXW+ eX =O (6=1,2,...,p), wearers (1) 
s=1 


and that V, _,, is given by the (p+ 1) equations 


5 ts) YS) — , a 2 
YzOX=0 and X=0. mers (| 
s=1 
Now consider the quantities 2}, ..., 2) as fixed while r takes the values 


9 
Ui, lass wey Wea 


* T.e. when the letters i or j appear twice, they are to be summed for all values. 
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y 


Nyg—p+ 


y 


Then the space J ; will alter for different values of r but V,, _, 
unaltered, since equations (2) do not involve the quantities 2. Thus when r takes 


remains 


the values 1, 2, ..., 2,, V,,_,,, rotates about the fixed space V,, _ 
We may consider aes 

about V,, _,,, then the projection of the axis OX on V,..-»+1 Will remain fixed for all 

successive positions OX, OX®, ..., OX) of OX since this is the line through O 


2?" 
, to remain fixed and the system of axes to rotate 


in V, _,,,,; which is perpendicular to V,, _,. 


As the quantities x? vary the n, lines OX vary independently in the space 
consisting of all lines through O perpendicular to V,, _,. 


Let 0, be the angle between OX and V, _,,,,. Then 


Pp 


A!) 9) 








We ee ee an2 
T= — 7; ny tan? 0... 
| | 
Aya, 1% Aiwa) n, 
Hence ye = YY = > + = (tan? 0,). 
A a eee 2, 
Hence we have shown that if /,, /,, ..., 1,, are n, lines through O which vary such 


that all directions in V, ,, are equally likely, and are independent except for the 
restriction that they all have the same projection m on V, _,,,,, then v® is dis- 


° . WO en P . y 
tributed as is — > (tan?@,), where 0, is the angle between /, and V,, 
s, 5 ‘ 
1 
The distribution of ¥ (tan? @,) will be unaltered if instead of the lines /, lying 
r 


in the same space V,, ,, we suppose that they are in different spaces V“,,, each 


p+ 


of (n.+ 1) dimensions, which intersect in the space V, 


Ng—p+1" 

We take rectangular axes OY (¢=1,2,...,p and r=1,2,..., n,) and OZ, 
OZ,, ..., OZ, _,,, containing a space V of (n,p +n —p+ 1) dimensions: 

Vi241 is the space contained by the (m.+1) axes OYY, .... OY®, OZ,, ..., 
OZ, p+ 

A ae is the space contained by the (n.—p+1) axes OZ,, ..., OZ,,. a“ 


As the lines /, have the same projection m on V, _,,,, we may regard them as the 
projections on the spaces V“), , of a line / through O which varies in such a way that 
all directions of / in V are equally likely. 
Now if @ is the angle between / and V, _,, it may be easily shown that 
> (tan? é,) = tan? 0. 
; 
Hence v? is distributed as is n/n, tan? @. 
It is also easy to prove* that @ is distributed according to the frequency law 
{(@)d0 = constant x sin"! @ cos":-” 0d0. Hence if we put 


Z = hlog|"2—P 4 
=y WP 


* For method of proof see Hotelling (1931). 


, 
: ate * | 


No p|A’l}’ 


l : ((Nm,>—p+1 a,,A 
tan? 0 L log (N_—F dy 
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then Z is distributed in Fisher’s z distribution with degrees of freedom N, and N,, 
where N, = n,p and N, = n,—p+l. 

When , = 1 we have the case of Hotelling’s T distribution. 

When p = | we obtain the ordinary z distribution of Fisher with degrees of 
freedom n, and 79. 


> 


When ., = ©, v” is distributed as y?/n,, where x” has n, p degrees of freedom, 
and takes the form «,;a,;;, where «,; = C;;/|C|, ¢,; = E(x;x;), and C;; is the co- 
factor of c;; in the determinant | C| = ||c¢,;||. The distribution of a,;a;; may 
easily be obtained directly by a different method. 

%,;@,; gives a measure of the “scatter” of the n, points whose rectangular 
coordinates in a space of p dimensions are (2x{?, x9), ..., 2) (r=1, 2, ...,m,), and if 
the parameters {c;;} of the population are known its distribution may be used to 
test whether the scatter of this set of points, which represent the given sample 
of size ”,, is too large to be consistent with the hypothesis that the sample has 
been drawn from a normal multivariate population with zero means and para- 
meters {c,;}. Usually however the parameters are unknown, in which case the 
quantities «;; must be replaced by estimates A;,/| A’| calculated from a second 

. - — ee pe ‘- 
sample, and the distribution of Z may then be used to test whether v? = —*__” is 
too large to be consistent with the hypothesis that both samples have been drawn 
from the same normal population. 

3. We shall show how the Z test may be applied to experiments in which a 
number of different treatments are to be compared by analysis of variance 
methods, and where several variates have been measured. 

We suppose that two independent sets of estimates of the variances and 
covariances have been calculated in the usual manner; one set {a,;} having been 
obtained from the treatment totals and the other set {a;,} from the error differences. 
Then it may be shown that if there are x, degrees of freedom for treatments, and 
n, degrees of freedom for error, and we assume the null hypothesis that there is 
no effect due to treatments, then 


Yer : 
a;; > x’a2? and a,,= > za 
. NMir=1 ’ . Neo s=—1 
where ay’ (r =1,2,...,.%,) and 2@ (s=I, 2, ...,%,) 
represent (, +.) independent sets of values of variates x; (i= 1,2, ..., pp) which 


are distributed in a normal multivariate distribution with zero means. 
Hence if we put ' 
Z = llog gio wr dt “4 Oi Ais | 
: a No P A’ |} 
as before, then Z is distributed according to the distribution obtained in § 2. 
[t will be noticed that the form of Z is not symmetrical in the two sets {a;;} and 


(@;;} 80 that the two must not be interchanged. 


If Z is found to be significantly large it will mean that the set of points whose 
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coordinates are the sets of treatment means (each point representing a different 
treatment) are more scattered than would be expected on the null hypothesis 
that the treatments had no effect, i.e. that there are significant differences between 
treatments. Of course, in what precedes, for the word “‘treatments’’ we may 
equally well substitute “blocks” such as are used in a randomized block experi- 
ment, or ““rows”’ and “columns” of a Latin square. 

We give a simple example of the application of the Z test for the case where 
p = 2. 
The following figures are taken from an analysis of variance and covariance 
of Stand (a) and Yield (y) of sugar beet, given by Snedecor (1937, p. 236): 





D.F. (x?) (xy) (y*) 
Blocks 5 7472-57 — 116-56 6-3134 
Error 30 28665-10 682-20 23-2326 


We shall test whether there are significant differences between blocks. 
Carrying out the ordinary analysis of variance test for the variate 2 we find 
7472-57/5 


z= 1llog 
“© |28665-10/30] 


= 0-2236, 
with degrees of freedom 5 and 30. 

Similarly for y we find 
{ 6°3134/5 | 


z2=1llog - 
- © |23-2326/30} 


- ()-2446 


also with degrees of freedom 5 and 30. 

The 5 % significance point of z for these degrees of freedom is 0-4648, hence 
neither of the separate analyses of variance of x and y reveal any significant 
differences between blocks. 


But let us put 


7472-57 ~ 116-56 6-3134 
Ay = = ° A192 Py = ; Mos = oe 
” »” ” 
; 28665-10 : s 682-20 ; 23-2326 
and 41 > : Ayo = Ay, = » Ags = 
30 . ° 30 any 30 
m a,;A;; ee = . 
Chen ‘t = 7-683 (p=2; 4, j7=1, 2), 
p|\A 


n, = 5 and ny = 30. 
Hence Z = Slog (33 x 7-683) 
1-0026, 
N, = 2p = 10 and Ny = ng—pt+1 = 29. 
The value of Z obtained is significantly large even at the 0-1 % significance 
level, as for degrees of freedom 10 and 29 the 0-1 % point of z is 0-7283, thus the 
differences between blocks are shown to be strongly significant. 
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The largeness of Z is partly accounted for by the fact that the “‘ between 
blocks” correlation coefficient (r) of — 0-537 differs greatly from the error corre- 
lation coefficient (r’) of + 0-836, and also partly by the fact that due to r’ being 
large | A’| is small. 

4. Wilks (1934) has considered the regression of a set of dependent variates 
upon another set of independent variates. We shall now show how the distribution 
of Z may be used to test the significance of the composite regression of the 
dependent variates {y;} (¢=1,2,...,p) on the independent variates {x,} (r=1, 2, 
swag ON 

In what follows we use a summation convention for all lower suffixes. 

Let us suppose that the y’s and 2’s are deviations from means and that the 
variate y; is normally distributed about the linear regression function /;,2, so 
that the joint probability law of {y;} is 


mr PI2 || o.. | [2 e—2%ij Yi Birt) Yj-P jets) ly, 


Cas x i p 
where «;; = | a] and c;; = E(y,y;) for fixed {x,} and dy = J] (dy,). 
i=1 


The coefficient f£,, is defined to be the generalized regression coefficient of 
y; ON &,. 

Now let {y$} and {v@! where «= 1, 2, ..., represent a sample of size n from 
the given population, where we suppose that the y® and 2®@ are deviations from 
the sample means, so that 


ae =O G= Le, 359) and > x =8- 22, 5: 


We fit lines of the form Y; = b;,x, to the given data by choosing the coeffi- 
cients b;,., which are estimates of the /;,, so as to make the quantity 
(a) 


Loves (Ye? — YY?) (ys? — Y;”)},- 


a 7 \e j 


el 


where Y = b,,2‘, a minimum. 


This expression is a minimum when {b,,} satisfy the following mp equations 


d {a,,2 (y —b,,a) = 0, 


ij“y Jj js 
x 
1.€. a,(d,; a rs js) — (). 
: . ) , . y fm) 
where ie2= Daag” and d,=D¥'s,". 
a a 


These equations are satisfied when g,,b;, = d;,. Hence 


Cae 
b.. = =a, 


ir | G@ | is? 


where G... is the cofactor of g,.. in the determinant | G| = || g,..||. The quantities 
| G,, is tl fact f g,, in the det t | ¢ I! gps ||. 7 | 
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b;,, being weighted sums of the yf, are distributed normally and it is easy to 
show that 


(1) E(b;,) _ Bir 


(2) The variance of b,, is —* .c;; (¢ and r not summed); 


| G| 
af , ey 
(3) The covariance of b;, and 6,, is iG! Cy. 
Now ¥ (y¥ — Y) (y ~ Y¢)) 
a 


= > (y — b; 7X) (YO — b;,x2) 
a 


ae (ct) 2 (x) DT aha) ala) __ ¥” 9 fa) (a) T ala) (a) 
7 zi ‘Y; —b,,2Y; vu, b;,2y§ Xs +b,,b;, 2° a v 





G, 
=> yoy —Tal | irdje- 
Hence Sy YP = > (YP — YP) (YP — YP) +5, D ya. 


a a 


Let us assume the null hypothesis that the regression coefficients £,, are all 
zero. — it may be proved that for each 4 we can find (nm — 1) linear functions 
EWM (y=1, ,n—1) of the y which are independently and normally distri 
buted with ouiad variances and zero means, and which are such that 


n m 


s. > y ye ®- > EM EM), 
a 7* 
n n—1 
. (x) (2)) (a! r(x) __ , EM) 
E (yP- YY) y@-Y)= S ew 
a=) y=m+1 
n n—1 
‘ Y (2%), (a) __ Y £ly) fy) 
and DP Y= X FP E9”. 
a 1 
] n 
Thus a, =—b, > yx 
m : a=1 J 
l n 
and a; = a (yy? — YH) (of — YY) 
(n—m—1),=1 J 





are independently distributed estimates of ¢;; with degrees of freedom m and 
(xn —m—1) respectively. Hence if we put 

, 
(n—m—p) _ a,;Aj;) 


y | 
Z=1\log x ; 
“ "E \(n —-m—1) pj\A'l} 


then Z is distributed in Fisher’s z distribution with degrees of freedom N, and N,, 
where N, = mp and N, = (n—m-—p). 

This gives the required test of significance of the regression of {y,} on {x,} 
If the value of Z obtained reaches the level of significance then the null hypo- 
thesis is considered to be disproved. 
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MISCELLANEA 


(i) Applicability of the z Test to a Poisson Distribution 


By R. A. CHAPMAN 


Assistant Silviculturist, Southern Forest Experiment Station 


THE distribution of z was derived on the assumption that the parent population was 
normally distributed; in actual practice this assumption usually does not hold. Several 
investigators have studied the distribution of z or its related function 9?* obtained from 
non-normal populations. Pearson (1931) studied the distribution of 9? for several types of 
population, in which the values for /, and /, were (0-0, 2-5), (0-0, 4-1), (0-0, 7-1), (0-2, 3-3), 
(0-5, 3-7), and (1-0, 3-8). The results of these experiments suggest that within the range of 
the experimental populations tried, the assumption of normality gives satisfactory results 
for most work. Eden & Yates (1933) also did some sampling work, using height measure- 
ments of wheat, but they did not deal with a very skewed distribution. At the Southern 
Forest Experiment Station the author recently had occasion to draw experimental samples 
from a distribution even more skewed than that used by Pearson or by Eden and Yates. 
This paper presents a brief report of the results obtained. 


TABLE I 


Original population sampled, and range of numbers used in sampling 


Value of Number of Range of 
variable occurrences numbers 
0 368 000—367 
| 368 368—735 
2 184 736-919 
3 61 920-980 
$ 15 981-995 
5 3 996—998 
6 l 999 


The parent population sampled was a Poisson distribution with a mean equal to 1, 


f, = 1:0401, 2, = 4:1031.+ This form of distribution was found in a study of the effect of 


greenhouse treatments on the mortality of longleaf and slash pine seedlings. The actual 
distribution sampled is shown in column 2 of Table I. From this population 100 samples 
of 16 values each were drawn, with the aid of Tippett’s random numbers, using the last 
three of the four digits in a column. Each 3-digit number drawn was then referred to the 
class interval shown in column 3 of Table I. If the number was between 000 and 367 it 
was called 0; if it was between 368 and 735 it was called 1; and so on. As the numbers 
(samples) were drawn they were separated into sub-samples of four values each. For each 


* »?=S[n,(Y,— Y)*V/S[(¥ —Y)*] 


+ Due to some small discrepancy in the frequency distribution, the values of £, and £ 
differed from their theoretical values of 1 and 4, respectively. 
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TABLE II 
Actual and theoretical frequency of F with yx? test 


Class 








: ; Actual Theoretic (a—t)* 
interval of F frequency = a - sates — (a—t) “ee 
(central values) i * at aa (te 
| 
| 
0-2 22 24-4 —2-4 0-2361 | 
0-6 33 23-8 +9-2 3°5563 
1-0 17 16-6 + 0-4 0-0096 
1-4 8 11-0 —3-0 0-8182 
1-8 4 7:3 —3-3 1-4918 
9.9 4 4-8 —0-8 0-1333 
2-6 4 3-6 + 0-4 0-0444 
3-0 >| 2. 
3-4 l 1-8 
< 8-5 —0-5 -029 
3-8 2| 1:1 8:5 Dd | 0-0294 
Above 4:0 3 3-5) 
Total 100 100-0 6-3191 = y” 
99 ——______—— —— — —,-—— > 
w | 
o os . — t ———— +—- T —— —— iz 4+——4 
a0 | = 
25+ 4 ‘ — % : a 
e | | & a 
< | © 
_j .80 | - ' + } 4 
® } | 
8) x 
< xo” 
WwW 
m= a 
< 70 4 
> | 6 
< ee 
© 5° 7 ee | . t 
Zz . 
= 
- 
wW .30 }— + + —_—— - ty 4 
oO ft 
5 x ACTUAL VALUES 


@® THEORETICAL VALUES 








> | 

10 } } —— 7 _ = = 4 

_ | 

2 | . | 

@ 08 } | | 4 1 = b— +--+ 

a | a 

7 = eee Gee 2 ee (ee ees eS 
i 2 4 6 8 ' 2 4 6 8 10 


F VALUE ON LOGARITHMIC SCALE 
Fig. 1. 


of the samples the total sum of the squares was divided into two parts: between means of 
sub-samples, with three degrees of freedom; and within sub-samples, with 12 degrees of 
freedom. Using the two estimates of variance so derived, an F value was computed : 

_ oR” 
Cw" 
This is the ratio of the variance between means of sub-samples to the variance within 
sub-samples. For convenience of computation the results are presented as F values rather 
than as z values. 
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The actual distribution of F from these 100 samples is shown in Table II, column 2, 
and in Fig. 1. The theoretical distribution of F is also shown. 

The Chi-square test of the comparison of actual and theoretical frequencies in Table I] 
gives a y? of 6-3191 which, with 7 degrees of freedom, has a probability of about 0-5. The 
agreement between the actual and theoretical distributions is therefore satisfactory, and 
this result confirms the conclusion reached by others that the z vest is applicable to skewed 
distributions. 

REFERENCES 
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Biometrika, 23, 114-33. 
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(ii) The Distribution of the Ratio of Estimates of the Two Variances 
in a Sample from a Normal Bi-variate Population 


By D. J. FINNEY, B.A., Clare College, Cambridge 


THE distribution of this ratio has been investigated by Bose (1935) by a method dependent 
on term-by-term integration of infinite series. The simplicity of his result suggested the 
possibility of the more direct approach given below, which is followed by the evaluation of 
the probability integral for the distribution. It is then shown how, by a simple transformation 
of existing tables, a test of significance may be applied when the population correlation 
coefficient is known, and how the test may be adapted when only a sample estimate of this 
correlation is available. 

By a suitable choice of scales, any normal bi-variate distribution function may be 
written as 


1 


— (x}—2p2,22.+ 23) 
e “i-p)"' 


s l 
Puts = 27(1 — p*)! 


The three second-order sums for a sample of size n from this population may be defined as 


vt 
y 


( 
\« 
p=1 


C5 = E (ajp—&,) (jp—%,), 1,7=1,2, 


iS 


where (2,,,%») are corresponding pairs of observations. The distribution of the c 
V' (C115 Caos Cr) de, dey.d¢j9, where 


, . n = _ 1 as 20a + Cas) 
V (C415 Cgq>Cy2) = K(€11€99 — Ciz) 2 2 Al-—p’*) 
n—]l : 
P re LQn-1 SS (n— w—2 
and K = 7#2"—-1(1 — p?) r| ; 7 (5 ) : 


If s?, s} are the estimates of the variances of x, and x, and r is the estimate of p from this 
sample, then 
c Coo Cee 
t= —.. = 22. p= ee 
n—1 n—1 (C4 Cee)! 
It follows that the distribution of @ = s,/s, is V(w)dw, where 


on ie n—4 Coe 
V(w) = 2Kw"-? dr | deg 1—r?) 2 ch-2e 2-p*) 
J-1 Jo 
1 n—4 
= 2K’w*-* dr(l—r?) 2 (1-— 2pur + w?)—"—-D 
1 


with K’ = KI'(n—1)2"-\(1—p?)"". 
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_ (A+p)v—(A—p) (1-2) 


~ (A+p)v+(A—p) (1—v)’ 





The substitution 


where A = 1+?, « = 2pw, reduces the integral to the form 


n n—2 n—4 n—4 n—2 
> 


= a : : 
V(w) = 2°-2K’w"-2(A2— yu?) 2 [ fatme 2 (l-v) 2 +(A—p)v 2? (1-v) 2 | dv 


n—1 n 


2(1—p?) 2 a... fe 4p*w* te 
ee n—l n—1) (1+ 0%)" | (1+ 2)? 
eae / 





> 








which is the result given by Bose. If the population values of the variances are o3, o3 the 
same distribution holds for 


When p = 0, the distribution reduces to 


2 @"-2 

V(@) eT ‘(+0)"-1" 
(rn—-l n— +@*?)"- 
B( >> 2) 


which is a particular case of that obtained by Fisher for the ratio of two independent estimates 
of a variance. 

Now the distributions of w and w7 are identical—as is otherwise obvious from the 
definition of w. Thus a sample value of the ratio may be so chosen as to give w>1, and the 
probability of obtaining o> 2>1 by random sampling is then 


P(Q) —— V(w) dw 


n 
x @ 1 f ) | dw 
X ) 1— 4p" = 
a \ 1+o*/ 1 f | 1+? | oY 
The substitution w? + w-? = e* + e-** — p*(e** + e-** — 2) 
transforms this to P(Q) x | ig dz 
transforms this t¢ (Se z i+e*)" 25 


which is the probability integral of Fisher’s z-distribution with degrees of freedom 


N, = N, = n—1, whence it follows that 
n—1 n—-Il 
ro) — 
P(2) ie I, | 9 °? 9° )> 
1 { 8 RS 6 ) 
where *= BD ~ ——} 
2 | J{(2+ 2-1)? — 4p7}) 


and the probability integral can be read from tables of the Incomplete Beta Function. 
Alternatively, significance levels can be constructed by entering Fisher’s z-table with 
nN, = Ng = n—1and, whenpisknown, finding the value of 2 corresponding tothe Zso obtained. 


Thus with n = 5, for various values of p the 5 % and 1 % points of 2? are as follows: 


p 0-0 0-1 0:3 0-5 0:7 0-9 0-95 0-99 
5 % point 6-388 | 6-342 5-968 5-217 4-072 2-457 1-923 1-348 
1 % point 15-978 | 15-837 14-710 12-450 | 9-050 | 4-443 3-040 1-686 
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This test for significance can only be applied when the population parameter, p, is known. 
When only a sample estimate of p, r say, is available, the method suggested by Hirschfeld 
(1937) can be adopted. For fixed n and 2, P(Q) is a monotonic function of p? and P(Q) >0 
as p*—>1. Thus, if Z is the entry of Fisher’s table corresponding to n, = n, = n—1 at the 
chosen level of significance, 2, determined from the sample, will be significant if p?> P?, 
where 


Clearly the meaning of a negative value of P? would be that significance is obtained by 
the ordinary z-test (Q?>e?2) and that therefore 2 is significant whatever the value of p 
may be. : 

If, when tested by Fisher’s transformation, |r| is significantly greater than the critical 
value | P 
mined from the same sample as 2 and it will be advantageous to use a more precise estimate 
when such is available. When r is small and based on few degrees of freedom there is little 
hope of attaining significance by this method if the ordinary z-test has failed to show its 
existence, but for a large r with many degrees of freedom the value of 2 for which significance 
is reached will be very considerably reduced. 





, the significance of 2 is assured. It is clearly not necessary that r should be deter- 


Example. In a paper on “Physical measurements and vital capacity’? Mumford & 
Young (192: 
age groups of schoolboys. Taking all measurements as percentages of their respective means, 
from Tables I and IT of this paper it is found that, for 173 boys aged 13-14 years, 





3) give results of measurements of standing height and stem length for different 


Standard deviation of standing height = 5-299 %, 

Standard deviation of stem length = 4-766 %. 
Hence 22 = 1-236. 
Also r = 0-878. 


Using Fisher (1936), §41, to obtain the 1 %, point of Z with mn, =n, = 175 


bo 


e2% = 1-427 


amis, 

and it follows that P = 0-805. 
Transforming the correlations z= 1-367, 
2p = l-] 14, 


and it is seen that 0-253 ,/170 = 3-30 is a unit normal deviate significant at the 1 % level. 
It is thus demonstrated that the stem length is proportionately less variable than the 
standing height in the population considered. 


REFERENCES 


BosE, 8. (1935). Sankhyd: Indian J. Statistics, 1, 65. 

FisHer, R. A. (1936). Statistical Methods for Research Workers, 6th ed. 
HirRsScHFELD, H. O. (1937). Biometrika, 29, 65. 

Moumrorp, A. A. & Youna, M. (1923). Biometrika, 15, 108. 
































Miscellanea 193 


(iii) Gauss’ Quadratic Formula with Twelve Ordinates 


By B. pr F. BAYLY 


Assistant Professor in Electrical Engineering, University of Toronto 


J. O. IRwin (1923) has pointed out the desirability of knowing the constants for Gauss’ 
quadrature formula using twelve ordinates. This computation, which is quite laborious, has 
been completed, and the results are given herewith. 

Legendre’s polynomial of the twelfth degree is 


P,,(x) = (676039 2! 


1 
1024 
— 1939938 a” 
2078505 a® 
— 1021020 2° 
225225 x4 
— 180182? 
231). 


If this is equated to zero the following values of the roots are obtained: 


a, —0-9815 6063 4246 7 log (—a,) 1-9919 1713 2571 1812 28 
dy —0-9041 1725 6370 5 1-9562 2475 8453 6039 54 
dz —0-7699 0267 4194 3 1-8864 3582 8118 1096 56 
a, —0-5873 1795 4286 6 1-7688 7327 7411 0133 37 
a, —0-3678 3149 8998 2 1-5656 4891 7004 6865 00 
ad, —0-1252 3340 85115 1-0977 2020 1052 5827 95 


The remaining roots a, to a,, are equal to a, to a,, only with positive sign. 
The equation for finding an integral is as follows: 


n=12qg—p G+D gd-D 
fiz)dz= "5 P(t +2=Pa.\b 


p n=1 = 


q 


n? 


the values of b,, being given in the following table: 


b, and by, 0-0471 7533 6386 4 
b, and b,, 0-1069 3932 5995 3 
b, and by, 0-1600 7832 8543 4 
b, and by 0-2031 6742 6723 2 
b, and 6, 0-2354 9253 6538 4 
b, and b, 0-2491 4704 5813 4 


As a check on these values log, 2 was calculated and found correct to the thirteenth place. 

In the article referred to above it was suggested that Gauss’ method with twelve ordinates 
would be satisfactory for computing such functions as the incomplete Beta-function. The 
function* 

"0-6 

als] —ax)*2 dx 
0 
I 


(16-1, 5:2) =< ; 
| eld] — a)" 2 da 


J90 


0-6 


was computed by this method and the value found was 
0-0567 0985 9126, 
the correct value being 0-0567 0986 1893. 
* Only the numerator was calculated by this method. The denominator of course is obtained 
from tables of the Gamma-function. 


Biometrika xxx 
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The error apparently is about one part in twenty million. Owing to lack of time however 
this final check computation has not been very carefully checked. In any event the use of 
Gauss’ method is not recommended for functions of this type as the above computation took 
several hours even with every possible aid to calculation. It is felt that with functions of this 
type other quadrature formulae even using three times as many ordinates would be less 
laborious. 


REFERENCE 
Irwin, J. O. (1923). Tracts for Computers, edited by Karl Pearson; No. X: ‘‘On quadrature 


and cubature, or on methods of determining approximately single and double integrals.” 
Camb. Univ. Press. 


(iv) Introduction to Mathematical Probability. By J. V. Usprnsxy. London: 
McGraw-Hill Publishing Company, 1937. Price 30s. 


THERE are two principles which should be followed by the writer of any elementary text-book. 
First, the theory should be set out simply and directly, so that it is intelligible to a reader 
who has no previous knowledge of the subject, and secondly, the theory should be illustrated 
by a number of worked examples, so that the reader having been shown “why” can under- 
stand “‘how’’. Prof. Uspensky follows these two principles, and his book should become a 
model for writers on the theory of probability for many years. 

The author gives a clear delineation of the development of the classical theory of prob- 
ability of to-day. He does not attempt to give its applications to other sciences, but his 
illustrations are such that the reader will have very little difficulty in finding these applica- 
tions for himself. An example of this is found in his derivation of the distribution of ‘‘Stu- 
dent’s”’ “‘¢”’ using characteristic functions. 

At a time when many books on probability are being written, and when the theory of 
probability is being applied in many different fields, it is very satisfactory to find the theory 
developed with such absolute clarity and unusual attention to rigour. Much is presented 
which hitherto has been unattainable except by a study of the literature of the Russian 
school, but it is possible to learn much also from his treatment of the theorems which are 
well known; for example, the proof of the famous and much-discussed theorem of Laplace is 
considerably enhanced by the method of estimating the error involved in its application. 

Chapters 1 and i contain approximately the theory of probability as usually given in 
text-books on algebra. Chapter 111 discusses the problem of repeated trials and contains 
a very ingenious method, due to Markoff, of approximating to large factorials, and the sum 
of large factorials, by means of continued fractions. Chapter Iv is exceptionally valuable to 
students of the theory of estimation, for the author discusses thoroughly the theorem of 
Bayes and its applications, and leaves no room for doubt of the fact that its application to 
practical problems is usually invalid, because of the lack of the necessary data. Perhaps here 
an example might have been added on its validity when applied to certain problems arising 
in the Mendelian theory. 

Chapter v introduces us to the simple theory of ‘‘ Markoff chains”’, and the use of differ- 
ence equations in solving questions in probability. Cantelli’s theorem on the upper limit of 
a probability is given in Chapter vi, while Chapter vi contains the proof of the theorem of 
Laplace to which we have already referred. The succeeding chapter on ‘‘ Further Considera- 
tions on Games of Chance” is not important from a point of view of the theory of probability, 
although it may be read with profit for its ingenious algebra. 

In Chapter rx we find the first discussion of a stochastic variable, and the elements of 
the mathematical theory of expectations are developed so as to lead us easily and naturally 
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in the next chapter to Tschebysheff’s Lemma, the law of large numbers, and Markoff’s 
theorem on the large numbers, The author discusses shortly the “strong law of largenumbers”’, 
this last being proved as an example at the end of the chapter. These laws are illustrated by 
numerical examples in both Chapters x and xr. 

Chapterxm is headed “‘ Probabilities in Continuum ’”’, and is concerned with the definitions 
of the characteristic function and the distribution function. 

Prof. Uspensky states in his preface that these twelve chapters may be read by persons 
‘without advanced mathematical knowledge’”’, while the remaining chapters, incorporating 
the results of modern researches, require from their readers a ““more mature mathematical 
preparation”. While the present writer thinks that the words “without advanced mathe- 
matical knowledge” might be qualified, since some of the analysis is by no means easy, there 
is no doubt that Chapters xm onwards are unrivalled in any comparable English text- 
book for the beauty and elegance of their methods of analysis. 

Chapter x1 discusses the Stieltjes Integral and its application in the theory of cha- 
racteristic functions, and Liapounoff’s inequality for moments. The examples given require 
a knowledge of contour integration. The next chapter follows in logical sequence with 
applications of this theory to further problems. Here we find Liapounoff’s theorem stated 
and proved with the aid of the characteristic function and the Liapounoff inequality. 

The remaining two chapters are of interest primarily to statisticians. The bivariate 
normal surface is discussed with the aid of the previous analysis, and the distributions of 
several different functions of normally distributed variables are obtained, notably those of 
8, r and ¢. 

The whole volume is illustrated by a wealth of examples, each of which adds to our 
understanding of the theory, if not to the theory itself. It is a pleasant surprise and stimulus 
to find theorems set as an exercise, with the outline of their proof given as an aid. This book 
is so good that it should remain a classic in the literature of the theory of probability for 
many years. 

One minor point of criticism might be raised. The present writer, at least, finds that the 
notation used by Prof. Uspensky in the first few chapters is confusing. Consider for example, 
the theorem on compound probability on p. 31. Prof. Uspensky writes 

(AB) = (A).(B,A) 
which is interpreted by the statement “‘the probability of the simultaneous occurrence of 
A and B is given by the product of the unconditional probability of the event A, by the 
conditional probability of B supposing A actually occurred”. It seems to the writer that the 
following notation is less confusing: 
P{AB} = P{A}. P{B| A}, 

which expression is interpreted in the same way as the above, where P{ } stands for “‘the 
probability that’’. However, notation is merely a matter of taste, and this small point does 
not detract from the value of the book as a whole. 

Prof. Uspensky modestly describes the subject of his book as the Elementary Theory of 
Probability. This raises the hope that one day we shall have another book from his pen in 
which he will write of the theory of probability based on the theory of measure and Lebesque- 
Stieltjes integration. Such a book would be read eagerly by all those who have enjoyed this 
present volume. 


F, N. Davin. 
Department of Statistics, 


University College, London. 
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(v) Heterostylism in natural populations of the Primrose, 
Primula acaulis 


By J. B. 8. HALDANE 


THE primrose is one of the heterostylic species of Primula, the flowers being either 
‘‘thrum” with short style and anthers at the mouth of the tube, or “‘pin” with long style 
and anthers in the tube. It is known that the two forms exist in nature in about equa! 
numbers; and that “‘legitimate”’ 


unions between the two types are much more fertile than 
*‘illegitimate”’ 


unions within a type (Darwin, 1877). Gregory (1915) found that thrum is 
dominant to pin, all natural thrums examined being heterozygous, so that the union of thrum 
and pin gave the two types in almost equal numbers (229 thrum, 236 pin), while thrum 
selfed gave 3 thrum: 1 pin (39 thrum, 13 pin). 

In counting natural populations I had two objects in view, to see whether the ratio of 
the two types diverged significantly from unity, and whether individual populations varied 
significantly from the mean proportion. I usually counted between 100 and 200 plants 
growing as closely together as possible, so that they might be regarded as a naturally inter- 
crossing population. Most of the populations were found on roadsides in Wales and 
southern England. Those at Garreg 1 and Port Meirion were in open woods, that at 
Garreg 2 in a pasture, while those at Ymstyllynn and Pangbourne were by the sides of 
railways. 

A certain difficulty was occasionally experienced in deciding whether two plants 
growing close together could have arisen by vegetative reproduction from one seedling. 
Where there was a doubt only one flower was observed, even though further inspection 
sometimes showed both thrum and pin plants in the same clump. 

The results are given in Table I together with those of Darwin (1877), Scott (quoted by 
Darwin) and v. Tschermak (1923). It may be remarked that v. Tschermak gives Darwin’s 
figures incorrectly. v. Tschermak’s sample was from a single locality. It will be seen that 
the totals in no case diverge significantly from equality. The grand total gives 50-83 + 
0:79% thrums. Thus if this is the true ratio another 6000 or so plants will have to be 
counted to establish a probably significant deviation from equality such as de Winton & 
Haldane (1933) found in experimental crosses of pin x thrum (but not thrum x pin) in 
Primula sinensis. The former cross gave 51-45 + 0-67 %, the latter 49-34 + 0-60 % of thrums. 

We have next to ask whether it is legitimate to calculate the standard error of this ratio. 
Can the individual populations be regarded as samples from a single large population? Or 
are they heterogeneous, even though their sum gives a ratio consistent with equality? The 
values of y* for an expectation of equality are given in Table I. The total for my data is 
x? = 24-93 with n=17. Using Haldane’s (1938) equation (19) we find P= 0-096. For all the 
data y?= 27-02 for n= 20, so P=0-131. 

If we wish merely to test for homogeneity, with one less degree of freedom, we can use 
the following transformation (Haldane, 1936a). 

If c be the true frequency of one class in a (2 x n)-fold table, and c’ the assumed value 
(here $), if N be the total number of the population, and if vy’? be the value of y? found when 
the value c’ is assumed, then the true value is 


l 

2 — —— c’(1—c’) x’”*—(c—c’)? N]. 

Be ag Oe oP a) 
For my seventeen populations c= 0-49088, so v?= 24-49, for n=16. Hence P=0-080. 

For all twenty populations c= 0-49146, y?= 25-82, n= 19, P=0-135. There is thus an indi- 

cation, but certainly no proof, of heterogeneity. Nevertheless, I am inclined to suspect 

that larger counts would reveal it. For I got the definite impression that runs of five or 
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TABLE I 
Place Thrum Pin x 

Red Roses (Pembroke) 92 
Bredenbury 1 (Hereford) 33 
Machynnilleth (Montgomery) 80 
Newport (Pembroke) 67 
Chancery (Cardigan) 63 
Bromlys 1 (Brecon) 58 0-648 
Pangbourne (Berks) 67 0-694 
Bredenbury 2 (Hereford) 46 0-367 
Haverfordwest (Pembroke) 76 0-228 
Garreg 1 (Caernarvon) 93 0-088 
Tenby (Pembroke) 70 0-066 
Port Meirion (Merioneth) 81 0-416 
Garreg 2 (Caernarvon) 80 0-536 
Bromlys 2 (Brecon) 74 0-871 
Ymstyllynn (Caernarvon) 51 1-330 
Jeffreston (Pembroke) 89 5-227 
Miscellaneous 10 2.: 

17 1172 1130 24-933 
Scott’s data (Edinburgh) 56 44 1-960 
Darwin’s data (Kent) 40 39 0-013 — 
v. Tschermak’s data (Austria) 758 745 0-112 

20 2026 1958 27-018 


more plants of the same type were more frequent than they should have been on a basis of 
chance. A ratio of 1-5 for y?/n could be explained if, on an average, 50% of seedlings repro- 
duced themselves once vegetatively, just as the fluctuations in the sex ratios of human 
families would be greater if 50% of all births were monozygotic twins. But I do not think 
the correction for clonal reproduction can have been more than 10% except perhaps in the 
population at Newport, which actually did not give very divergent numbers. 

It is certain that no cbvious environmental effect exists on the thrum : pin ratio. The two 
most extreme populations were found on road banks within a few miles of one another. 
The situation is quite unlike that found in Lythrum salicaria (Haldane, 1936 6) where the 
frequencies of the three types in different localities were undoubtedly different. The 
reason is probably as follows. Suppose a single pin plant among a number of thrums. Then 
if there is an adequate opportunity for cross-fertilization its pollen will “‘take”’’ on all the 
thrum plants, since it is probable that legitimate pollen tubes grow quicker than illegitimate 
(ct. Tseng, 1937). Thus equality will be almost if not quite restored in one generation. 
Whereas if there is only one long-styled Lythrum among a number of mid-styled and short- 
styled plants its pollen will only have twice the opportunities of the other types. 

If a significant excess of thrum plants is ultimately found, the explanation is far from 
obvious. Darwin (1877, p. 36) found that when protected from flying insects (but not from 
thrips) pin plants set 19-2 seeds per capsule on an average as a result of fertilization, whilst 
thrums set only 6-2. If this were so we might expect an excess of pins in sparse primrose 
populations, such as furnished the ‘“‘miscellaneous” group. But only further work will 
confirm or disprove this hypothesis. 
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SUMMARY 


The ratio of thrum to pin plants among 2302 primroses did not differ significantly from 
equality. Individual populations did not diverge from equality to a significantly greater 
extent than could be expected as the result of sampling error. 
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(vi) Notes of Karl Pearson’s Lectures on the Theory of Statistics, 1884-96* 
By G. U. YULE, F.R:S. 


INTRODUCTION 


In the following abstract of my notes on these early lectures the actual terminology 
has been in general retained: much of it, e.g. the terms ‘‘centroid’’ (centre of gravity) and 
“swing radius” (radius of gyration, root-mean-square radius), is conveyed direct from 
Professor Pearson’s lectures to engineers, and might well puzzle a modern statistician. 
Sentences or paragraphs placed in quotation marks are direct quotations from the notes. 
These lectures are so closely related to the early memoirs that it is desirable to keep in 
mind the dates by which these were completed, as indicated by the dates of receipt by the 
Royal Society. The more important, for which detailed references are given below, are: 


(1) Dissection of compound Normal Distribution Oct. 18th, 1893 

(2) Skew Variation Dec. 12th, 1894 

(3) Note on Regression and Inheritance in the case of two parents. June 5th, 1895 
Proc. Roy. Soc. tvi, pp. 240-241 

(4) Regression, Heredity and Panmixia Sept. 28th, 1895 

(5) On the Probable Errors, etc. (Pearson and Filon) Oct. 18th, 1897 


The memoir on the dissection of a compound normal distribution had then been 
completed a year before the first course began ; the memoir on skew variation was completed 
at the close of the first term of that course; the first note on correlation (including a partial 
regression equation) in which it is stated that ill health had delayed the completion of the 
full memoir, towards the end of the summer term and the full memoir itself, in which the 
**best”’ value of the correlation coefficient (i.e. the method of maximum likelihood value, 
obtained from the product-sum formula) was given for the first time, at the end of the 


* The following article was very kindly prepared by Mr Yule as an additional Appendix to 
my memoir of Karl Pearson (Biometrika, 28, 193-257 and 29, 161-248). It will be reprinted with 


the rest of the memoir when this is published shortly as a separate volume by the Cambridge 
University Press. [&.s. P.] 
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following long vacation. The long and important memoir on probable errors was not finished 
till after the end of the second course. Dates only occur rather erratically in the notes: 
they have been given when they place the work in a given term. 

The first course opened with a brief outline sketch of history, leading up to a “‘ Kollektiv- 
mass” definition of statistics. Among the works bearing on theory to which we were 
referred those of Zeuner, Lexis, Edgeworth, Westergaard and Levasseur might be expected: 
but would any other lecturer have thought of suggesting the study of Marey’s La Méthode 
graphique dans les Sciences Expérimentales (1878 and 1885)? Karl Pearson was an enthusiast 
for graphic representation and thought in graphic terms. After this introduction, theory 
proper was begun with Bayes’ Theorem—not with the correlation approach of later years, 
which would hardly have been likely then. Thereafter we were taken to frequency distribu- 
tions, means and moments in general, and a classification of theoretical forms was suggested. 
The binomial series followed, and the normal curve: for an area table of the normal curve 
based on the standard deviation reference could only be made to the short table printed 
in the Gresham Lecture Notes. The discussion of the error in the standard deviation caused 
by an error in any given ordinate, when the standard deviation is determined from the 
moment of any given order, I do not recall seeing given elsewhere. There was then a reversion 
to the moment problem and the moments of the binomial series: the correcting terms in 
these, which seem to have puzzled Professor Fisher,* are simply the correcting terms required 
to give the moments of the representation by histogram or by frequency polygon—i.e. 
the moments of the graphic figure—in terms of the moments of weighted ordinates. Some 
problems on standard deviations evidently concluded the work of the first term. The second 
term, apparently after completing the last subject, began with reference to the sources 
from which examples of skew distributions could be drawn: some of such distributions are 
probable compound, and this led to a series of notes on memoir(1). Some problems on inheri- 
tance were then interpolated, and after this followed the derivation of frequency curves 
from the binomial series and the hypergeometric series, in fact the work of memoir(2), which 
had been completed only in the previous December. No date in the notes indicates where 
the work of this term ends, but the notes are so extensive that the lectures must almost 
have continued into the summer term. In that term at least will have followed the work 
on correlation, not completely published till the end of the following September. 

A straightforward, organized, logically developed course clearly could hardly then exist 
when the very elements of the subject were being developed: there are occasional breaches 
of continuity, or divergences to subsidiary or illustrative problems that were interesting 
the lecturer: or a difficult problem, e.g. the moments of the hypergeometric series, might 
be simply dropped ard taken up again a little later. In the following year this feature 
becomes still more marked. Memory will not now recall exactly what happened, but the 
members of the class were probably largely engaged in practical work: this is the only way 
in which I can account for the lectures apparently not beginning till November 21st. It 
will be seen that such practical work evidently accompanied, or was interpolated in, the 
lecture course at a later stage to test the results arrived at in the lectures on skew corre- 
lation, which in conjunction with the practical work formed a piece of pure research. It 
will be noted also that some lectures on probable errors were inserted in January 1896 
in the middle of those on skew correlation, probably while the test-work was going on. 

The lectures on Theory of Error in May 1896 are of interest: the first set of experiments 
(bisection of line) on which the memoir(6) of 1902 was founded were carried out that summer 
(1896) ((6), p. 243). The curious result for a distribution compounded of two half normal 
curves I do not remember seeing elsewhere. One other point calls for a late apology from 
me: when writing the note ‘“‘On Reading a Scale” (Jour. Roy. Statist. Soc. 1927) I had no 
recollection that Karl Pearson had directed my attention to preferences and avoidances 
of particular digits thirty-one years before! The note came as a complete surprise. 

Sheppard’s Theorem, which concludes the notes, must presumably have been personally 
communicated to Pearson, as it was not published till some two years later. 


* See footnote to an article on W. F. Sheppard, Annals of Eugenics, vitt (1937), pp. 9-10. 
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SUMMARY OF THE LECTURES 


The material is taken from G.U.Y.’s notes of that date, now preserved in the 
Department of Statistics at University College 


Session 1894-1895 

Original meaning of word ‘‘statistics”. Outline history: Graunt, Petty, de Witt, Breslau 
mortality statistics, Halley, Kersseboom, Déparcieux, Siissmilech, Achenwall, Playfair, 
Laplace, Quetelet; Edgeworth, Galton and Weldon; Mayr, Block, Gabaglio. Definition: 
“Statistics is simply a name used for aggregate measurements of any facts whatever, 
whether social, physical or biological. The theory of pure statistics is that branch of 
mathematics which deals with the compilation, representation and handling of numerical 
aggregates—and this independently of the facts which the numbers represent. Applied 
Statistics is the application of the methods of pure statistics to special classes of facts— 
biological, physical or political observations for example.’’ Works on theory cited: Zeuner, 
Lexis, Edgeworth, Westergaard, Levasseur, Marey. 

Bayes’ Theorem: the fundamental principles assumed (1) permanence of statistical 
ratios, (2) equal distribution of ignorance (Note: ‘‘At the Gresham Lectures the audience 
were asked to guess how many white balls there were in a bag of 20 black and white: the 
guesses grouped round 10, quite unreasonably.”’?) Examples of Bayes’ Theorem. ‘‘The 
statistically supported principle of the equal distribution of ignorance.” 

Frequency curves: continuous and discontinuous distributions: great variety of forms. 
Mean, median and most frequent value or mode. Deviations, different meanings. Quartiles, 
percentiles, Galton’s Ogive: disadvantage of representation by percentiles. 

Moments : mean error, mean pth deviation: “probable deviation”’ in excess and defect, 
‘*probable error”’ in this sense. The standard deviation, defined as ‘‘the swing radius of the 
curve about the centroid vertical.” 
dard deviation. 








Modulus. Skewness, measured by (mean — mode)/stan- 


Forms of frequency curve classified in five types: 1. Mode at one end of base. 2. Curve 
rising at a definite angle to base, range limited or unlimited at other end. 3. Range limited 
in one direction, but curve starting tangential to base. 4. Skew, range unlimited im both 
directions. 5. Symmetrical, range limited or unlimited. A function is wanted to cover all 
these forms. Brief reference to frequency surfaces or correlation surfaces for two or mor« 
variables. 

The binomial distribution: experiments show that “the mathematically possible distri 
bution is the experimentally probable distribution.” Illustrations. Representation by 
polygon, becoming a curve in the limit when n is large. Binomial machine. 

Normal curve: s.d. may be determined either from (1) the co-ordinates of the centroid, 
i.e. of the centre of gravity of the area between the curve and its .base line, or (2) from the 
swing radius about the centroid vertical: it is usual to take the areas of the elementary 
trapezia as concentrated on their mid-ordinates, but corrections will be required. Moments 
of the normal curve. Error in the s.d. caused by an error in any given ordinate, when the 
s.d. is determined from a moment of any given order. Area table of normal curve (reference 
to Gresham Lecture Notes) and its use: use of three times the s.d. as limit for likely devia- 
tions. Fitting from mean deviation and from quartiles. Rough test of ‘‘ goodness of fit” 
by ratio X (errors of fit without regard to sign)/N : values for 12 actual distributions, ranging 
from about 6 to 13-5 per cent. 

The standard deviation of the standard deviation for a normal distribution. 

“Reduction of the moments of a curve treated as a series of trapezia to its moments 
when the elementary areas are concentrated along ordinates”: (this is the heading in my 
notes, but the problem taken is to express the moments of the representation by histogram 
(rectangles), or by frequency polygon, in terms of the moments of weighted ordinates: 
the work is that of the memoir((2), pp. 348 et seq.). 

Moments of the binomial series. Complete fitting of a binomial, taking the interval c 
between ordinates as unknown, as well as n, p and q. 
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Determination of the standard deviation of a ratio z,/z, in terms of the s.d.’s of z, and 
2g, when deviations are assumed small compared with the means and z,, z, are independent. 
(Dec. 20th.) 

General result for any function of the z’s. 

Statistical sources of skew distributions: homogeneous distributions and compound 
curves. 

The dissection of a compound normal distribution: notes on the memoir(1). 

Some problems in inheritance for a population following the normal law. (1) Parents 
of deviation x in a population with s.d. o, give rise to a fraternity with mean x/n and 
s.d. 0’: what is the distribution of the next generation? Generalization for successive 
generations. Biological deductions. (2) A normal sub-population of parents is selected with 
mean fA and s.d. XY: what is the distribution of the offspring? 

The slope-relation between the normal curve and the symmetrical binomial. The slope 
relation for general binomial: the resulting skew curve: its moments and method of fitting 
(memoir(2)). The empirical (one-third) relation between mean, median and mode. Reduction 
of this distribution to normal curve. Edgeworth’s distribution (generalized normal curve). 

Generalization of binomial by removing the assumption that “contributory causes” 
are not independent: “the theory of interdependence will be based on the assumption that 
the independence of contributory causes is limited by a limited material from which to 
produce effect,” e.g. drawing r balls from a bag containing pn black, gn white. The (hyper- 
geometric) series in this case: moments, slope relation, and resulting curves (memoir(2)). 

Correlation: notion of x, y, z being correlated from each being a function of 2, pz ... Pn: 
assuming that (1) all variations in p’s are small, (2) follow the normal law, (3) are independent, 
the general expression for the normal correlation distribution is deduced. 

Special case of two correlated variables: expression of the parameters in terms of 
N, 0, G2, and r, ‘“‘Galton’s function”. Properties of the distribution; regressions, s.d.’s 
of arrays. The “‘best’”’ value to give r, deduction of the product-sum formula. “‘This method 
of reckoning 7 has not been used for any system of correlated organs, but approximate 
methods, by no means the best, have been used by Galton, Weldon and Edgeworth.” 

The standard deviation of the coefficient of correlation for a normal distribution 
(the erroneous value, in effect the standard error for determinate values of the standard 
deviations, corrected in memor(5), p. 242). 

Contour lines of normal surface: Galton’s determination of the contours as ellipses and 
estimation of r from the vertical tangents. The slope of the principal axes: estimation of r 
from these axes, determined say by cutting the ellipses ky circles. Estimation of r from 
the s.d. of arrays. The s.d.’s in direction of principal axes: expression for the normal surface 
referred to the principal axes. The property that the proportion of frequency falling outside 
the ellipse y is e—#x* (Bravais): “‘ probable ellipse’’ and “‘standard ellipse.”” The proportion 
of frequency lying within a circle of given radius round the mode: table: approximate 
formulae. 

Normal distribution for three variables: deduction of the general expression in terms of 
standard deviations and correlations. Correlation between father, mother and offspring 
as an example. The regression equation. The three-variable surface referred to principal 
axes: the contour ellipsoid: proportion of frequency outside a given ellipsoid: short table. 
The chance that an observation lies in a particular cone or polar element spreading out from 
the centre. 


Session 1895-1896 


‘The following notes on skew correlation were begun Nov. 21st, 1895.” It is not clear 
why they begin so late in term. ‘‘ Up to the present no theory of skew correlation exists and, 
although numerous observations involving the frequency of two variables are easily seen 
at once to be skew, no correlation surface has yet been fitted to such distributions. Hence 
whatever theory we adopt must be regarded as a trial, and its only justification must be 
that it suffices to describe observed statistics.”” Three different approaches were tried. 
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I. Hypothesis that the variations in two directions at right angles are independent, 
suggested by the normal surface. Relations of moments and product moments: the 
directions of independent variation are the principal axes. Problem: ‘‘ Both independent 
variations being of the same type, what must that type be in order that every vertical 
section of the surface shall be also of the same type?’’: proof that it must be the normal 
law. First four moments of such a surface about the principal axes in terms of moments 
parallel to the axes of measurement. Order of work: (1) Find first four moments of total 
distributions. (2) Determine principal axes. (3) Convert moments calculated to moments 
about principal axes, which determine the distributions for principal axes, say $(x), f(y). 
(4) z=¢(x) f(y) is the equation to the surface. Note added at end: ‘‘The previous assump- 
tion (independent variation in two directions at right angles) was found not to work in the 
case of Perozzo’s age-at-marriage surface. This led to trial of a more extensive assumption.” 

II. (Feb. lst, 1496.) Two directions of independent variation, not necessarily orthogonal. 
General case: deduction of condition that for n variables there shall be n directions of 
independent variation, not necessarily orthogonal. Special case of two variables: the 
directions of independent variation must be conjugate diameters of the ellipse of inertia: 
moments and product moments and their relations. Concluding note dated March 1896: 
“This theory was tried on a surface correlating the heights of barometers at two different 
stations and failed.” 

III. Hypergeometrical surface. A bag contains n balls, pn white, gn black: m balls 
are drawn (without return) and then a second lot of m’ balls. What is the chance that r 
of the m and s of the m’ are black? ‘‘One of the advantages of this form of the correlation 
ordinate is that the surface summed in the direction of either axis of correlation gives the 
very expression from which we have deduced skew variation curves; in other words we 
shall expect the curve formed by the sums parallel to either axis to be the skew curves 
we have already found to be applicable. Moreover, any section parallel to either of the axes 
of correlation is also a hypergeometrical series, i.e. a close approximation to a skew curve.” 
Attempts were made on two different lines to deduce a curved surface (1) by means of 
a slope-relation as for skew curves, (2) by approximating by Stirling’s theorem. The results 
are summarized as follows: “‘The attempt to get a surface parallel to the polyhedron of 
correlation which arises in ordinary chance problems leads us to values of the differentials 
of z (the ordinate) which, as far as we see, cannot be integrated. But these values of z 
show us two points of interest (1) that there is only one direction in any skew correlation 
surface in which the line of modes is a straight line and (2) in any other direction it is 
a cubic curve. Approximating to the ordinate of the same surface by Stirling’s theorem 
we obtain an equation which confirms the results of our first two trials (i.e. I and II), 
for there is no possibility of breaking up the expression into factors. The fact that the 
curve of regression is a cubic is also confirmed, and the form that may approximately be 
given to it, at least in the neighbourhood of the mode is also confirmed.” (March 26th, 
1896.) 

Reproductive Selection (April 23rd, 1896). The deduction of the formulae (i) to (iv) 
published in the ‘‘Note on Reproductive Selection”, Proc. Roy. Soc. trix, p. 301, received 
Feb. 13th, 1896. 

The following section on probable errors is dated at end January 1896: this suggests 
the lectures were interpolated while the practical work on skew correlation was being done: 

Probable errors of skew curve constants: with a note on the differentials of Gamma- 
functions, and a short table. General theorem on the probable error of a mean: the approach 
is that of the method used in the memoir by Pearson and Filon(5). This is followed by a 
similar General Theorem on the Probable Error of any Constant. 

Theory of Errors (May 14th, 1896). Classification of types of error: theoretical errors, 
instrumental errors, personal errors. ‘‘ Astronomers do not appear to have ever dealt with 
personal equations by the experimental method.” There are two points, the deviation of 
an observer from the truth and the mean deviation of his observations from his own mean. 
‘*‘Neither of these points has been really looked into. Error of judgment and variability 
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of judgment are both important.” Preferences for particular digits noted: “for example, 
in 1000 readings by the same observer 0-3 occurred only 30, but 0-4, 170 times.” Accidental 
or irregular errors: the problem of the rejection of observations. Considerations arising 
as to the assumption of normality: the criteria never exactly fulfilled. To test effect of 
slight divergence from the normal, a distribution is considered composed of two half normal 
curves, numbers of observations above and below mean n, and ng, s.d.’s 7, and Gy. GC, is 


then written o, x a. is assumed small and n, —n, also small, and the moments evaluated, 
with the final result 


Probable errors of a and of the criterion in this neighbourhood. 

Sheppard’s Theorem for the correlation in terms of the frequencies in the four quadrants 
of a normal distribution divided at the medians: geometrical proof (Phil. Trans. Roy. Soc. 
A, cxcm (1898)). 
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(vii) Frequency Curves and Correlation. By W. Patix Exprrton. Third 
Edition. Cambridge, at the University Press, 1938. Price 12s. 6d. 


In the following review only those parts of the book will be dealt with, which have been 
altered since the former edition. For that reason we are specially interested in chapters 
10, 11 and 12, which, according to the preface of this third edition, have in many respects 
been rewritten. The headings of these chapters are: ‘“‘Standard errors”’, ‘“The test of good- 
ness of fit’? and ‘‘The correlation ratio-contingency”’. In the chapter concerned with the 
test of goodness of fit, R. A. Fisher’s opinion about the y*-test is explained in greater detail 
than in earlier editions. The author has the sound opinion that: “‘when we merely want to 
compare several graduations of the same distribution we can often stop our work after the 
calculation of y?.”’ The methods for deducing standard errors (chapters 10 and 12) are 
more exact than before and treated in greater detail. 

In the other chapters, which have not been rewritten but only altered in one or another 
respect, we observe a short historical note about the normal curve of error, being a transition 
type of the Pearson curves. Further, in chapter 6, reference is made to the underlying theory 
of the A-series, which as in the former edition is stated as an alternative to the Pearson 
curves. In chapter 3 the method for working out the moments by iterated summations is 
simplified in the well-known way by first computing the factorial moments. A new and 
valuable Appendix (number 5 in the new edition) has been added, containing a short 
description of other methods than that of moments for estimating unknown constants. The 
methods described are (1) that of least squares, (2) that of maximum likelihood and 
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(3) the minimum x?-method. It is also worth mentioning that in Appendix 2 some account 
is given not only of the complete ‘‘beta’- and “‘gamma’”’-functions but also of the 
incomplete ones, and references are made to tables of these functions. At the end of the 
book there is, as in the former edition, a table of log I’(p) but the new edition also contains 
brief tables of the normal curve of error and of the y?-distribution. 

The new edition, as the earlier ones, is mainly a textbook for computers and specially 
for those wishing to apply Pearson curves to empirical distributions. Much new beyond 
that contained in the former edition has not been added to these technical sides, and the 
disposition of the book is maintained. For the further statistical analysis the author has 
in the new edition made valuable additions and alterations, the most important of which 
have been mentioned above. 


O. LUNDBERG. 
Stockholm, 1938. 








